菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-09
📄 Abstract - Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck

Chain-of-Thought (CoT) prompting improves LLM accuracy on complex tasks but often increases token usage and inference cost. Existing "Budget Forcing" methods reducing cost via fine-tuning with heuristic length penalties, suppress both essential reasoning and redundant filler. We recast efficient reasoning as a lossy compression problem under the Information Bottleneck (IB) principle, and identify a key theoretical gap when applying naive IB to transformers: attention violates the Markov property between prompt, reasoning trace, and response. To resolve this issue, we model CoT generation under the Conditional Information Bottleneck (CIB) principle, where the reasoning trace Z acts as a computational bridge that contains only the information about the response Y that is not directly accessible from the prompt X. This yields a general Reinforcement Learning objective: maximize task reward while compressing completions under a prior over reasoning traces, subsuming common heuristics (e.g., length penalties) as special cases (e.g., uniform priors). In contrast to naive token-counting-based approaches, we introduce a semantic prior that measures token cost by surprisal under a language model prior. Empirically, our CIB objective prunes cognitive bloat while preserving fluency and logic, improving accuracy at moderate compression and enabling aggressive compression with minimal accuracy drop.

顶级标签: llm theory model training
详细标签: reasoning efficiency information bottleneck chain-of-thought reinforcement learning lossy compression 或 搜索:

推理即压缩:通过条件信息瓶颈统一预算强制 / Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck


1️⃣ 一句话总结

这篇论文提出将大语言模型中的思维链推理视为一个压缩问题,通过一种新的条件信息瓶颈训练目标,在减少推理过程长度的同时,能更智能地保留关键逻辑信息,从而在控制计算成本的同时保持甚至提升任务准确性。

源自 arXiv: 2603.08462