菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-17
📄 Abstract - Anatomy of Capability Emergence: Scale-Invariant Representation Collapse and Top-Down Reorganization in Neural Networks

Capability emergence during neural network training remains mechanistically opaque. We track five geometric measures across five model scales (405K-85M parameters), 120+ emergence events in eight algorithmic tasks, and three Pythia language models (160M-2.8B). We find: (1) training begins with a universal representation collapse to task-specific floors that are scale-invariant across a 210X parameter range (e.g., modular arithmetic collapses to RANKME $\approx$ 2.0 regardless of model size); (2) collapse propagates top-down through layers (32/32 task X model consistency), contradicting bottom-up feature-building intuition; (3) a geometric hierarchy in which representation geometry leads emergence (75-100% precursor rate for hard tasks), while the local learning coefficient is synchronous (0/24 precursor) and Hessian measures lag. We also delineate prediction limits: geometric measures encode coarse task difficulty but not fine-grained timing (within-class concordance 27%; when task ordering reverses across scales, prediction fails at 26%). On Pythia, global geometric patterns replicate but per-task precursor signals do not -- the precursor relationship requires task-training alignment that naturalistic pre-training does not provide. Our contribution is the geometric anatomy of emergence and its boundary conditions, not a prediction tool.

顶级标签: theory model training machine learning
详细标签: capability emergence representation geometry scale invariance neural network training representation collapse 或 搜索:

能力涌现的解剖学:神经网络中的尺度不变表征塌缩与自上而下的重组 / Anatomy of Capability Emergence: Scale-Invariant Representation Collapse and Top-Down Reorganization in Neural Networks


1️⃣ 一句话总结

这篇论文通过分析神经网络训练过程中的几何特征,发现新能力的涌现通常始于一种与模型大小无关的、自上而下的表征塌缩,并揭示了这种几何变化是预测能力出现的关键先兆,但其预测能力在自然语言预训练模型中存在局限。

源自 arXiv: 2602.15997