抽象表征几何结构支持大型语言模型的推理能力 / Abstract representational geometry supports inference in large language models
1️⃣ 一句话总结
本研究通过将人类反转学习任务迁移到文本环境中,发现大型语言模型在进行推理时,其内部状态会形成类似人脑海马体的低维正交抽象几何结构,且这种结构在模型深层更为显著,通过干预实验进一步证实了这种几何结构对推理能力的关键作用。
A defining feature of human intelligence is the ability to adapt to changing environments by inferring latent task structure from sparse observations. Neuroscientific research indicates that this capability relies on the hippocampus constructing abstract representations, expressed as low-dimensional, approximately orthogonal manifolds in neural state space. However, the internal mechanisms of large language models (LLMs) remain largely opaque, making it unclear whether they form comparable abstract representations or instead rely on task-specific statistical regularities when performing comparable reasoning tasks. Here we adapt a contextual reversal-learning paradigm to a text-based setting and compare humans and LLMs at both the Behavioural and representational levels. We report that although LLMs exhibit generalizable reasoning less frequently than humans, when such inference occurs, their internal states exhibit abstract geometric structures that resemble those reported in the hippocampus. Notably, this representational geometry is not uniformly distributed but is organized hierarchically across model depth: whereas lower layers show early, stable encoding of stimulus identity, higher layers form a hippocampal-like functional band enriched for abstract context geometry associated with inference. Furthermore, complementary intervention experiments mechanistically implicate geometry in reasoning: task-sequence language modelling induces geometric disentanglement, whereas geometric regularization of higher layers increases the emergence of generalizable inference. Together, these findings establish abstract representational geometry as a mechanistic principle supporting inference in large language models.
抽象表征几何结构支持大型语言模型的推理能力 / Abstract representational geometry supports inference in large language models
本研究通过将人类反转学习任务迁移到文本环境中,发现大型语言模型在进行推理时,其内部状态会形成类似人脑海马体的低维正交抽象几何结构,且这种结构在模型深层更为显著,通过干预实验进一步证实了这种几何结构对推理能力的关键作用。
源自 arXiv: 2606.23345