菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-20
📄 Abstract - On the Cost and Benefit of Chain of Thought: A Learning-Theoretic Perspective

We develop a learning-theoretic framework for understanding Chain of Thought (CoT). We model CoT as the interaction between an answer map and a chain rule that generates intermediate questions autoregressively, and define the reasoning risk of a hypothesis under this interaction. Our first result is a tight canonical decomposition of this risk into two terms with opposing roles: an oracle-trajectory risk (OTR), which captures the benefit of CoT and reduces to a target-domain risk in a domain adaptation problem, and a trajectory-mismatch risk (TMR), which captures the cost of CoT through error accumulation along mismatched reasoning trajectories. We then show that this cost is unavoidable without structure: if any one of the loss, the hypothesis answer map, or the chain rule lacks stability, the TMR can be arbitrarily large even when the OTR is zero and the hypothesis is uniformly close to the ground truth. Conversely, under stability, we prove a tight upper bound on the TMR governed by an exact amplification factor that identifies bounded, linear, and exponential error-growth regimes. Together, these results give a precise theory of when CoT helps, when it hurts, and what controls the transition between the two.

顶级标签: llm theory
详细标签: chain of thought learning theory error analysis reasoning risk 或 搜索:

链式思维的成本与收益:一个学习理论视角 / On the Cost and Benefit of Chain of Thought: A Learning-Theoretic Perspective


1️⃣ 一句话总结

本文从学习理论角度建立了一个框架,将链式思维分解为“收益”(通过优化推理路径降低风险)和“成本”(错误在推理链中累积),并严格证明了当模型或推理规则不稳定时,成本可能无限大,而稳定性则决定了错误是线性增长还是指数爆炸,从而揭示了链式思维何时有效、何时失效的根本条件。

源自 arXiv: 2605.21260