Long Chain-of-Thought Compression via Fine-Grained Group Policy Optimization

📄 Abstract - Long Chain-of-Thought Compression via Fine-Grained Group Policy Optimization

Large Language Models (LLMs) often generate unnecessarily verbose Chain-of-Thought (CoT) reasoning that increases computational costs and latency without proportional performance gains. In this paper, we propose \textbf{F}ine-grained \textbf{G}roup policy \textbf{O}ptimization (\textbf{FGO}), a Reinforcement Learning (RL) algorithm that refines group responses by subdividing them and assigning appropriate weights based on length and entropy, thereby enabling effective CoT compression. Meanwhile, as an enhanced variant of Group Relative Policy Optimization (GRPO), FGO successfully addresses two major limitations of the GRPO: inefficient data utilization and entropy collapse. We evaluate FGO on multiple reasoning LLMs and benchmarks, including MATH500, AIME24, AMC23, and Minerva. Experimental results show that FGO achieves efficient CoT compression without degrading performance, and simultaneously resolves the key limitations of GRPO.

通过细粒度分组策略优化实现长思维链压缩 / Long Chain-of-Thought Compression via Fine-Grained Group Policy Optimization

1️⃣ 一句话总结

这篇论文提出了一种名为FGO的新算法，它能够智能地压缩大型语言模型生成的冗长思维链，在保持模型推理能力不变的前提下，有效降低计算成本和延迟，并解决了原有方法数据利用效率低和熵崩溃的问题。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要