菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-02
📄 Abstract - The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors

Large language models (LLMs) represent prompt-conditioned beliefs (posteriors over answers and claims), but we lack a mechanistic account of how these beliefs are encoded in representation space, how they update with new evidence, and how interventions reshape them. We study a controlled setting in which Llama-3.2 generates samples from a normal distribution by implicitly inferring its parameters (mean and standard deviation) given only samples from the distribution in context. We find representations of curved "belief manifolds" for these parameters form with sufficient in-context learning and study how the model adapts when the distribution suddenly changes. While standard linear steering often pushes the model off-manifold and induces coupled, out-of-distribution shifts, geometry and field-aware steering better preserves the intended belief family. Our work demonstrates an example of linear field probing (LFP) as a simple approach to tile the data manifold and make interventions that respect the underlying geometry. We conclude that rich structure emerges naturally in LLMs and that purely linear concept representations are often an inadequate abstraction.

顶级标签: llm theory model evaluation
详细标签: representation geometry belief manifolds linear probing in-context learning steering interventions 或 搜索:

信念的形状:沿着语言模型后验表示流形的几何、动态与干预 / The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors


1️⃣ 一句话总结

这篇论文发现大语言模型在内部并非简单地用直线概念表示信念,而是形成了复杂的弯曲“信念流形”,并指出基于几何结构的干预方法比传统的线性干预更有效、更能保持模型原有的推理能力。

源自 arXiv: 2602.02315