Scaling Laws from Sequential Feature Recovery: A Solvable Hierarchical Model

View PDF

Abstract:We propose a simple mechanism by which scaling laws emerge from feature learning in multi-layer networks. We study a high-dimensional hierarchical target that is a globally high-degree function, but that can be represented by a combination of latent compositional features whose weights decrease as a power law. We show that a layer-wise spectral algorithm adapted to this compositional structure achieves improved scaling relative to shallow, non-adaptive methods, and recovers the latent directions sequentially: strong features become detectable at small sample sizes, while weaker features require more data. We prove sharp feature-wise recovery thresholds and show that aggregating these transitions yields an explicit power-law decay of the prediction error. Technically, the analysis relies on random matrix methods and a resolvent-based perturbation argument, which gives matching upper and lower bounds for individual eigenvector recovery beyond what standard gap-based perturbation bounds provide. Numerical experiments confirm the predicted sequential recovery, finite-size smoothing of the thresholds, and separation from non-hierarchical kernel baselines. Together, these results show how smooth scaling laws can emerge from a cascade of sharp feature-learning transitions.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST)
Cite as:	arXiv:2605.14567 [stat.ML]
	(or arXiv:2605.14567v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2605.14567 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Arie Wortsman-Zurich [view email]
[v1] Thu, 14 May 2026 08:37:28 UTC (136 KB)