Member-only story
12 min read
10 hours ago
--
We are paying for 390 billion parameters of ‘dark matter’ that do nothing but generate heat
The Ghost Gradient: Why 90% of Your Model’s Weights Are Already Dead
The Industry Is Paying for 390 Billion Parameters of Dark Matter
Press enter or click to view image in full size
“It isn’t always the case that features correspond cleanly to neurons, especially in large language models where it actually seems rare for neurons to correspond to clean features.” — Elhage et al., Anthropic (2022)
Here is the most expensive lie in the history of technology:
More parameters = more intelligence.
It ships in every press release. It anchors every benchmark headline. It justifies every multi-billion dollar training run.
It is wrong.
Not directionally wrong. Not nuanced-wrong.
Mathematically wrong.
And we can prove it — with nothing more exotic than Singular Value Decomposition.
***Without paywall version, follow me if you enjoyed this*********
