机制引导的选择性遗忘:针对RLVR诱导推理行为的定向消除 / Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning
1️⃣ 一句话总结
本文提出一种名为MAST的新方法,通过分析模型内部注意力机制的变化模式,仅更新最关键的部分参数来精准消除强化学习(RLVR)训练带来的特定推理能力,同时最大程度保留模型在其他任务上的性能,相比传统全参数更新方法显著减少了副作用。
We propose MAST (Mechanism-Aligned Selective Targeting), a mechanism-guided method for unlearning RLVR-induced reasoning with substantially lower collateral damage than standard full-parameter updates. In matched SFT/RLVR checkpoints on Qwen2.5-Math-1.5B and Qwen3-1.7B-Base, the SFT-to-RLVR increment differs sharply from the SFT update in token-level delta-log-probability, and full-parameter gradient ascent forgets only by damaging retain MATH and GSM8K. MAST ranks attention-projection tensors by off-principal energy, update magnitude, and forget-gradient coupling magnitude, then updates only the top-ranked subset. On the primary model, MAST induces statistically significant target forgetting (MATH forget 45/150 to 37/150; McNemar p=0.0078) while preserving GSM8K (+0.8 pp) and MATH retain (-0.5 pp). The advantage reproduces across seeds, NPO/SimNPO objectives, and Qwen3, where MAST preserves GSM8K while full-parameter unlearning collapses it.
机制引导的选择性遗忘:针对RLVR诱导推理行为的定向消除 / Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning
本文提出一种名为MAST的新方法,通过分析模型内部注意力机制的变化模式,仅更新最关键的部分参数来精准消除强化学习(RLVR)训练带来的特定推理能力,同时最大程度保留模型在其他任务上的性能,相比传统全参数更新方法显著减少了副作用。
源自 arXiv: 2606.19222