崩塌前的结构:下一词预测中的瞬态语义几何 / Structure Before Collapse: Transient semantic geometry in next-token prediction
1️⃣ 一句话总结
这篇论文发现,在训练语言模型预测下一个词时,神经网络在早期会自发学习到词语之间的语义相似性(例如,“打破”后面的词通常是“硬”或“无生命”的),但这种有意义的语义结构只是暂时的,随着训练深入,模型最终会陷入一种对称且忽略语义相似性的“神经崩塌”状态。
Neural Collapse predicts that balanced one-hot classification pushes model representations to be equally far from each other; a symmetric configuration that depends only on the output label and ignores any semantic similarity in the inputs. This creates a puzzle: next-token prediction language models are trained predominantly (as context length increases) with one-hot labels: the same context is very unlikely to appear twice in training with different labels. However, they clearly learn latent structural features. That is, despite the one-hot training regime, a language model's contextual embeddings represent the fact that the next word in ''Mary broke the ___'' is likely to be filled by tokens in the latent classes of a) medium-sized, b) rigid, c) inanimate nouns. How does gradient descent find such categorical semantic structure when co-occurrence statistics collapse to one-hot sparsity, eliminating any shared next-tokens among different contexts? To investigate this tension we identify three synthetic controlled settings where inputs have latent semantic factors but are mapped to distinct one-hot labels. We find that semantic geometry emerges early in training, and that representations cluster by shared attributes despite receiving no explicit supervision to do so. This structure is transient: with sufficient capacity and time, the model eventually reaches the predicted symmetric state where all representations are equally separated. We study this phase transition through Gram matrix analysis and propose a preliminary modification to the commonly used unconstrained features model to capture the emergent semantic geometry.
崩塌前的结构:下一词预测中的瞬态语义几何 / Structure Before Collapse: Transient semantic geometry in next-token prediction
这篇论文发现,在训练语言模型预测下一个词时,神经网络在早期会自发学习到词语之间的语义相似性(例如,“打破”后面的词通常是“硬”或“无生命”的),但这种有意义的语义结构只是暂时的,随着训练深入,模型最终会陷入一种对称且忽略语义相似性的“神经崩塌”状态。
源自 arXiv: 2606.26749