菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-27
📄 Abstract - Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training

Large language models (LLMs) can now solve complex problems through long chain-of-thought (CoT) reasoning, but the trade-off between performance and token cost remains a central challenge. To address this issue, supervised fine-tuning (SFT) often uses compressed reasoning data, where CoT traces are shortened into compact forms. However, the effect of such compressed reasoning data on post-training remains poorly understood. In this paper, we propose a taxonomy of CoT consisting of Explicit CoT, which outputs all operations without aggregation, Composed CoT, which combines multiple operations into a single step, and Implicit CoT, which omits intermediate operations. We construct a synthetic compositional reasoning task that allows controlled variation of difficulty, compression granularity, and data size, and conducted a comprehensive set of experiments across different model families and sizes. Notably, we find that (i) coarser CoT requires more SFT data, (ii) compared with Explicit CoT, Composed CoT and Implicit CoT benefit more from data scaling, while Composed CoT benefits from data repetition and Implicit CoT tends to lead to memorization, (iii) unlike SFT, subsequent reinforcement learning (RL) with verifiable rewards (RLVR) decomposes compressed steps learned during SFT, and (iv) unidirectional CoT ordering shows stronger generalization on longer sequential tasks. Our findings provide implications for CoT design under data resource constraints and offer important insights into the mechanisms of SFT and RL in LLM post-training.

顶级标签: llm model training model evaluation
详细标签: chain-of-thought data compression supervised fine-tuning reinforcement learning reasoning 或 搜索:

压缩思想:压缩推理数据在大型语言模型后训练中的作用时机与方式 / Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training


1️⃣ 一句话总结

本文研究了在大型语言模型后训练中,使用压缩推理数据(即缩短思考链)的效果,发现不同类型的压缩方式对监督微调和强化学习的影响各异,并在不同数据规模下表现出不同的泛化能力与记忆倾向。

源自 arXiv: 2605.28008