菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-24
📄 Abstract - Shape of Thought: When Distribution Matters More than Correctness in Reasoning Tasks

We present the surprising finding that a language model's reasoning capabilities can be improved by training on synthetic datasets of chain-of-thought (CoT) traces from more capable models, even when all of those traces lead to an incorrect final answer. Our experiments show this approach can yield better performance on reasoning tasks than training on human-annotated datasets. We hypothesize that two key factors explain this phenomenon: first, the distribution of synthetic data is inherently closer to the language model's own distribution, making it more amenable to learning. Second, these `incorrect' traces are often only partially flawed and contain valid reasoning steps from which the model can learn. To further test the first hypothesis, we use a language model to paraphrase human-annotated traces -- shifting their distribution closer to the model's own distribution -- and show that this improves performance. For the second hypothesis, we introduce increasingly flawed CoT traces and study to what extent models are tolerant to these flaws. We demonstrate our findings across various reasoning domains like math, algorithmic reasoning and code generation using MATH, GSM8K, Countdown and MBPP datasets on various language models ranging from 1.5B to 9B across Qwen, Llama, and Gemma models. Our study shows that curating datasets that are closer to the model's distribution is a critical aspect to consider. We also show that a correct final answer is not always a reliable indicator of a faithful reasoning process.

顶级标签: llm model training natural language processing
详细标签: chain-of-thought synthetic data reasoning distribution shift data curation 或 搜索:

思维形态:在推理任务中,数据分布比答案正确性更重要 / Shape of Thought: When Distribution Matters More than Correctness in Reasoning Tasks


1️⃣ 一句话总结

这篇论文发现,即使使用大模型生成的、最终答案是错误的思维链数据来训练语言模型,也能提升其推理能力,因为数据分布与模型自身更匹配,且错误答案中往往包含有价值的推理步骤。

源自 arXiv: 2512.22255