推理即压缩:通过条件信息瓶颈统一预算强制 / Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck
1️⃣ 一句话总结
这篇论文提出将大语言模型中的思维链推理视为一个压缩问题,通过一种新的条件信息瓶颈训练目标,在减少推理过程长度的同时,能更智能地保留关键逻辑信息,从而在控制计算成本的同时保持甚至提升任务准确性。
Chain-of-Thought (CoT) prompting improves LLM accuracy on complex tasks but often increases token usage and inference cost. Existing "Budget Forcing" methods reducing cost via fine-tuning with heuristic length penalties, suppress both essential reasoning and redundant filler. We recast efficient reasoning as a lossy compression problem under the Information Bottleneck (IB) principle, and identify a key theoretical gap when applying naive IB to transformers: attention violates the Markov property between prompt, reasoning trace, and response. To resolve this issue, we model CoT generation under the Conditional Information Bottleneck (CIB) principle, where the reasoning trace Z acts as a computational bridge that contains only the information about the response Y that is not directly accessible from the prompt X. This yields a general Reinforcement Learning objective: maximize task reward while compressing completions under a prior over reasoning traces, subsuming common heuristics (e.g., length penalties) as special cases (e.g., uniform priors). In contrast to naive token-counting-based approaches, we introduce a semantic prior that measures token cost by surprisal under a language model prior. Empirically, our CIB objective prunes cognitive bloat while preserving fluency and logic, improving accuracy at moderate compression and enabling aggressive compression with minimal accuracy drop.
推理即压缩:通过条件信息瓶颈统一预算强制 / Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck
这篇论文提出将大语言模型中的思维链推理视为一个压缩问题,通过一种新的条件信息瓶颈训练目标,在减少推理过程长度的同时,能更智能地保留关键逻辑信息,从而在控制计算成本的同时保持甚至提升任务准确性。
源自 arXiv: 2603.08462