手风琴式思考:通过自调节步骤摘要实现高效可读的大语言模型推理 / Accordion-Thinking: Self-Regulated Step Summaries for Efficient and Readable LLM Reasoning
1️⃣ 一句话总结
这篇论文提出了一种名为‘手风琴式思考’的新方法,让大语言模型学会在推理过程中自动总结并压缩中间思考步骤,从而在不降低解题准确性的前提下,大幅提升推理效率并降低内存消耗,同时生成的摘要也使推理过程对人类更易读。
Scaling test-time compute via long Chain-ofThought unlocks remarkable gains in reasoning capabilities, yet it faces practical limits due to the linear growth of KV cache and quadratic attention complexity. In this paper, we introduce Accordion-Thinking, an end-to-end framework where LLMs learn to self-regulate the granularity of the reasoning steps through dynamic summarization. This mechanism enables a Fold inference mode, where the model periodically summarizes its thought process and discards former thoughts to reduce dependency on historical tokens. We apply reinforcement learning to incentivize this capability further, uncovering a critical insight: the accuracy gap between the highly efficient Fold mode and the exhaustive Unfold mode progressively narrows and eventually vanishes over the course of training. This phenomenon demonstrates that the model learns to encode essential reasoning information into compact summaries, achieving effective compression of the reasoning context. Our Accordion-Thinker demonstrates that with learned self-compression, LLMs can tackle complex reasoning tasks with minimal dependency token overhead without compromising solution quality, and it achieves a 3x throughput while maintaining accuracy on a 48GB GPU memory configuration, while the structured step summaries provide a human-readable account of the reasoning process.
手风琴式思考:通过自调节步骤摘要实现高效可读的大语言模型推理 / Accordion-Thinking: Self-Regulated Step Summaries for Efficient and Readable LLM Reasoning
这篇论文提出了一种名为‘手风琴式思考’的新方法,让大语言模型学会在推理过程中自动总结并压缩中间思考步骤,从而在不降低解题准确性的前提下,大幅提升推理效率并降低内存消耗,同时生成的摘要也使推理过程对人类更易读。
源自 arXiv: 2602.03249