方向性推理轨迹变化(DRTC):识别推理模型中的关键轨迹段 / Directional Reasoning Trajectory Change (DRTC): Identifying Critical Trace Segments in Reasoning Models
1️⃣ 一句话总结
这篇论文提出了一种名为DRTC的新方法,它通过分析模型在推理过程中不确定性和概率分布的变化,来精准定位并量化是哪些前文信息真正‘扭转’了模型的推理方向,从而帮助我们理解大语言模型是如何一步步思考的。
Understanding how language models carry out long-horizon reasoning remains an open challenge. Existing interpretability methods often highlight tokens or spans correlated with an answer, but they rarely reveal where the model makes consequential reasoning turns, which earlier context causally triggers those turns, or whether the highlighted text actually steers the reasoning process. We introduce Directional Reasoning Trajectory Change (DRTC), a process-causal framework for interpreting long-form reasoning from a single on-policy rollout. DRTC detects pivot decision points using uncertainty and distribution-shift signals, then applies receiver-side interventions that preserve the realized rollout without resampling the continuation while blocking information flow from selected earlier chunks only at a pivot. It measures whether each intervention redirects the direction of the model's log-probability trajectory relative to the realized rollout direction, producing a signed per-chunk attribution score. We also compute turning-angle curvature changes on raw logits as a complementary diagnostic and introduce curvature signatures to summarize shared intervention-response geometry. Empirically, directional influence is sharply concentrated across four reasoning models (per-example |DRTC| shares yield Gini 0.50 to 0.58 and top-5 percent mass 0.23 to 0.28), and learned pivots induce stronger intervention magnitudes than matched random spans. In a scaling study on 500 MATH problems with R1-Distill-Qwen-1.5B, learned spans outperform matched random spans (median delta = 0.409, 355 of 500 positive; sign test p = 2.3e-21). Overall, DRTC provides a causally grounded, trajectory-level view of how specific context elements steer reasoning under on-policy dynamics.
方向性推理轨迹变化(DRTC):识别推理模型中的关键轨迹段 / Directional Reasoning Trajectory Change (DRTC): Identifying Critical Trace Segments in Reasoning Models
这篇论文提出了一种名为DRTC的新方法,它通过分析模型在推理过程中不确定性和概率分布的变化,来精准定位并量化是哪些前文信息真正‘扭转’了模型的推理方向,从而帮助我们理解大语言模型是如何一步步思考的。
源自 arXiv: 2602.15332