菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-28
📄 Abstract - The Geometric Cost of Normalization: Affine Bounds on the Bayesian Complexity of Neural Networks

LayerNorm and RMSNorm impose fundamentally different geometric constraints on their outputs - and this difference has a precise, quantifiable consequence for model complexity. We prove that LayerNorm's mean-centering step, by confining data to a linear hyperplane (through the origin), reduces the Local Learning Coefficient (LLC) of the subsequent weight matrix by exactly $m/2$ (where $m$ is its output dimension); RMSNorm's projection onto a sphere preserves the LLC entirely. This reduction is structurally guaranteed before any training begins, determined by data manifold geometry alone. The underlying condition is a geometric threshold: for the codimension-one manifolds we study, the LLC drop is binary -- any non-zero curvature, regardless of sign or magnitude, is sufficient to preserve the LLC, while only affinely flat manifolds cause the drop. At finite sample sizes this threshold acquires a smooth crossover whose width depends on how much of the data distribution actually experiences the curvature, not merely on whether curvature exists somewhere. We verify both predictions experimentally with controlled single-layer scaling experiments using the wrLLC framework. We further show that Softmax simplex data introduces a "smuggled bias" that activates the same $m/2$ LLC drop when paired with an explicit downstream bias, proved via the affine symmetry extension of the main theorem and confirmed empirically.

顶级标签: theory model training machine learning
详细标签: normalization layers bayesian complexity geometric constraints local learning coefficient affine symmetry 或 搜索:

归一化的几何代价:神经网络贝叶斯复杂度的仿射界 / The Geometric Cost of Normalization: Affine Bounds on the Bayesian Complexity of Neural Networks


1️⃣ 一句话总结

这篇论文通过几何分析证明,LayerNorm通过将数据约束到一个超平面,能自动降低模型的统计复杂度,而RMSNorm则完全保留复杂度,这种差异源于两者对数据流形曲率的处理方式不同。

源自 arXiv: 2603.27432