Does Your Reasoning Model Implicitly Know When to Stop Thinking?

📄 Abstract - Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Recent advancements in large reasoning models (LRMs) have greatly improved their capabilities on complex reasoning tasks through Long Chains of Thought (CoTs). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-time applications. Recent studies show that longer reasoning chains are frequently uncorrelated with correctness and can even be detrimental to accuracy. In a further in-depth analysis of this phenomenon, we surprisingly uncover and empirically verify that LRMs implicitly know the appropriate time to stop thinking, while this capability is obscured by current sampling paradigms. Motivated by this, we introduce SAGE (Self-Aware Guided Efficient Reasoning), a novel sampling paradigm that unleashes this efficient reasoning potential. Furthermore, integrating SAGE as mixed sampling into group-based reinforcement learning (SAGE-RL) enables SAGE-RL to effectively incorporate SAGE-discovered efficient reasoning patterns into standard pass@1 inference, markedly enhancing both the reasoning accuracy and efficiency of LRMs across multiple challenging mathematical benchmarks.

你的推理模型是否隐式地知道何时停止思考？ / Does Your Reasoning Model Implicitly Know When to Stop Thinking?

1️⃣ 一句话总结

这篇论文发现大型推理模型其实隐含着知道何时该停止思考的能力，并提出了一种名为SAGE的新采样方法，能有效利用这种能力，在提升模型推理准确率的同时大幅减少不必要的计算步骤。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要