ATTNPO:基于注意力引导的过程监督高效推理方法 / ATTNPO: Attention-Guided Process Supervision for Efficient Reasoning
1️⃣ 一句话总结
这篇论文提出了一种名为ATTNPO的新方法,它巧妙地利用模型自身的注意力信号来识别并减少推理过程中的冗余步骤,从而在保证甚至提升推理准确率的同时,显著缩短了推理长度,实现了更高效、更精准的复杂问题求解。
Large reasoning models trained with reinforcement learning and verifiable rewards (RLVR) achieve strong performance on complex reasoning tasks, yet often overthink, generating redundant reasoning without performance gains. Existing trajectory-level length penalties often fail to effectively shorten reasoning length and degrade accuracy, as they uniformly treat all reasoning steps and lack fine-grained signals to distinguish redundancy from necessity. Meanwhile, process-supervised methods are typically resource-intensive and suffer from inaccurate credit assignment. To address these issues, we propose ATTNPO, a low-overhead process-supervised RL framework that leverages the model's intrinsic attention signals for step-level credit assignment. We first identify a set of special attention heads that naturally focus on essential steps while suppressing redundant ones. By leveraging the attention scores of these heads, We then employ two sub-strategies to mitigate overthinking by discouraging redundant steps while preserving accuracy by reducing penalties on essential steps. Experimental results show that ATTNPO substantially reduces reasoning length while significantly improving performance across 9 benchmarks.
ATTNPO:基于注意力引导的过程监督高效推理方法 / ATTNPO: Attention-Guided Process Supervision for Efficient Reasoning
这篇论文提出了一种名为ATTNPO的新方法,它巧妙地利用模型自身的注意力信号来识别并减少推理过程中的冗余步骤,从而在保证甚至提升推理准确率的同时,显著缩短了推理长度,实现了更高效、更精准的复杂问题求解。
源自 arXiv: 2602.09953