迷失于噪声之中:推理模型如何在上下文干扰项中失效 / Lost in the Noise: How Reasoning Models Fail with Contextual Distractors
1️⃣ 一句话总结
这篇论文发现,当前先进的AI推理模型在面对真实场景中的无关信息干扰时,性能会急剧下降,并提出了一个名为NoisyBench的基准测试来评估模型抗干扰能力,同时揭示了一种能通过奖励模型识别有用信息来提升鲁棒性的新方法。
Recent advances in reasoning models and agentic AI systems have led to an increased reliance on diverse external information. However, this shift introduces input contexts that are inherently noisy, a reality that current sanitized benchmarks fail to capture. We introduce NoisyBench, a comprehensive benchmark that systematically evaluates model robustness across 11 datasets in RAG, reasoning, alignment, and tool-use tasks against diverse noise types, including random documents, irrelevant chat histories, and hard negative distractors. Our evaluation reveals a catastrophic performance drop of up to 80% in state-of-the-art models when faced with contextual distractors. Crucially, we find that agentic workflows often amplify these errors by over-trusting noisy tool outputs, and distractors can trigger emergent misalignment even without adversarial intent. We find that prompting, context engineering, SFT, and outcome-reward only RL fail to ensure robustness; in contrast, our proposed Rationale-Aware Reward (RARE) significantly strengthens resilience by incentivizing the identification of helpful information within noise. Finally, we uncover an inverse scaling trend where increased test-time computation leads to worse performance in noisy settings and demonstrate via attention visualization that models disproportionately focus on distractor tokens, providing vital insights for building the next generation of robust, reasoning-capable agents.
迷失于噪声之中:推理模型如何在上下文干扰项中失效 / Lost in the Noise: How Reasoning Models Fail with Contextual Distractors
这篇论文发现,当前先进的AI推理模型在面对真实场景中的无关信息干扰时,性能会急剧下降,并提出了一个名为NoisyBench的基准测试来评估模型抗干扰能力,同时揭示了一种能通过奖励模型识别有用信息来提升鲁棒性的新方法。
源自 arXiv: 2601.07226