MLP残差网络中的秩坍塌、不动点与重正化群结构 / Rank Collapse, Fixed Points, and the Renormalization Group Structure of MLP Residual Networks
1️⃣ 一句话总结
本文通过实验首次定量证明,MLP残差网络在训练后会对输入数据进行类似重正化群的选择性粗粒化处理:它会根据数据相关性自动保留关键自由度、丢弃无关信息,并形成靠近不动点的稳定层结构。
The analogy between deep neural network forward passes and renormalization group (RG) flows has been repeatedly noted in the literature, but existing treatments remain qualitative: depth is described as a coarse-graining scale, attention is likened to a partition function, and representations are said to flow toward fixed points. No existing work has defined a measurable RG order parameter, tested it under controlled variation of the input distribution, or made quantitative predictions that are empirically verified. We study the simplest architecture for which the analogy is tractable: a pure MLP residual stack trained on masked token prediction over synthetic Markov chain sequences with known spectral properties. We report three findings. (i) The effective rank of the residual stream decreases monotonically with depth after training, consistent with progressive integration of irrelevant degrees of freedom. (ii) This rank collapse is selective: it occurs for chains with short correlation length approximately 1 but is absent for chains with long correlation length approximately 7, measured at the position level to control for mean-pooling artifacts. The network preserves exactly the degrees of freedom relevant to the prediction task, the content of the RG relevance criterion. (iii) Inter-layer kernel drift is concentrated at one or two specific transitions, with the remainder of the network near a fixed point, consistent with a discrete fixed-point plateau. Together these findings constitute the first quantitative, position-level evidence that MLP residual networks implement a selective coarse-graining procedure governed by the spectral structure of the input distribution.
MLP残差网络中的秩坍塌、不动点与重正化群结构 / Rank Collapse, Fixed Points, and the Renormalization Group Structure of MLP Residual Networks
本文通过实验首次定量证明,MLP残差网络在训练后会对输入数据进行类似重正化群的选择性粗粒化处理:它会根据数据相关性自动保留关键自由度、丢弃无关信息,并形成靠近不动点的稳定层结构。
源自 arXiv: 2606.10324