Scaling Laws from Sequential Feature Recovery: A Solvable Hierarchical Model

📄 Abstract - Scaling Laws from Sequential Feature Recovery: A Solvable Hierarchical Model

We propose a simple mechanism by which scaling laws emerge from feature learning in multi-layer networks. We study a high-dimensional hierarchical target that is a globally high-degree function, but that can be represented by a combination of latent compositional features whose weights decrease as a power law. We show that a layer-wise spectral algorithm adapted to this compositional structure achieves improved scaling relative to shallow, non-adaptive methods, and recovers the latent directions sequentially: strong features become detectable at small sample sizes, while weaker features require more data. We prove sharp feature-wise recovery thresholds and show that aggregating these transitions yields an explicit power-law decay of the prediction error. Technically, the analysis relies on random matrix methods and a resolvent-based perturbation argument, which gives matching upper and lower bounds for individual eigenvector recovery beyond what standard gap-based perturbation bounds provide. Numerical experiments confirm the predicted sequential recovery, finite-size smoothing of the thresholds, and separation from non-hierarchical kernel baselines. Together, these results show how smooth scaling laws can emerge from a cascade of sharp feature-learning transitions.

从顺序特征恢复看缩放定律：一个可解的层次化模型 / Scaling Laws from Sequential Feature Recovery: A Solvable Hierarchical Model

1️⃣ 一句话总结

本文通过一个简单的层次化网络模型，解释了为什么随着数据量的增加，模型的预测误差会以幂律形式平稳下降，其核心机制是网络会按照特征的重要程度，从强到弱依次学会数据中的潜在特征，从而形成所谓的‘缩放定律’。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要