菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-10
📄 Abstract - Long Chain-of-Thought Compression via Fine-Grained Group Policy Optimization

Large Language Models (LLMs) often generate unnecessarily verbose Chain-of-Thought (CoT) reasoning that increases computational costs and latency without proportional performance gains. In this paper, we propose \textbf{F}ine-grained \textbf{G}roup policy \textbf{O}ptimization (\textbf{FGO}), a Reinforcement Learning (RL) algorithm that refines group responses by subdividing them and assigning appropriate weights based on length and entropy, thereby enabling effective CoT compression. Meanwhile, as an enhanced variant of Group Relative Policy Optimization (GRPO), FGO successfully addresses two major limitations of the GRPO: inefficient data utilization and entropy collapse. We evaluate FGO on multiple reasoning LLMs and benchmarks, including MATH500, AIME24, AMC23, and Minerva. Experimental results show that FGO achieves efficient CoT compression without degrading performance, and simultaneously resolves the key limitations of GRPO.

顶级标签: llm model training agents
详细标签: chain-of-thought reasoning compression reinforcement learning policy optimization efficiency 或 搜索:

通过细粒度分组策略优化实现长思维链压缩 / Long Chain-of-Thought Compression via Fine-Grained Group Policy Optimization


1️⃣ 一句话总结

这篇论文提出了一种名为FGO的新算法,它能够智能地压缩大型语言模型生成的冗长思维链,在保持模型推理能力不变的前提下,有效降低计算成本和延迟,并解决了原有方法数据利用效率低和熵崩溃的问题。

源自 arXiv: 2602.10048