Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning

📄 Abstract - Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning

We propose MAST (Mechanism-Aligned Selective Targeting), a mechanism-guided method for unlearning RLVR-induced reasoning with substantially lower collateral damage than standard full-parameter updates. In matched SFT/RLVR checkpoints on Qwen2.5-Math-1.5B and Qwen3-1.7B-Base, the SFT-to-RLVR increment differs sharply from the SFT update in token-level delta-log-probability, and full-parameter gradient ascent forgets only by damaging retain MATH and GSM8K. MAST ranks attention-projection tensors by off-principal energy, update magnitude, and forget-gradient coupling magnitude, then updates only the top-ranked subset. On the primary model, MAST induces statistically significant target forgetting (MATH forget 45/150 to 37/150; McNemar p=0.0078) while preserving GSM8K (+0.8 pp) and MATH retain (-0.5 pp). The advantage reproduces across seeds, NPO/SimNPO objectives, and Qwen3, where MAST preserves GSM8K while full-parameter unlearning collapses it.

机制引导的选择性遗忘：针对RLVR诱导推理行为的定向消除 / Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning

1️⃣ 一句话总结

本文提出一种名为MAST的新方法，通过分析模型内部注意力机制的变化模式，仅更新最关键的部分参数来精准消除强化学习（RLVR）训练带来的特定推理能力，同时最大程度保留模型在其他任务上的性能，相比传统全参数更新方法显著减少了副作用。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要