菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-11
📄 Abstract - The First Drop of Ink: Nonlinear Impact of Misleading Information in Long-Context Reasoning

As large language models are increasingly deployed in retrieval-augmented generation and agentic systems that accumulate extensive context, understanding how distracting information affects long-context performance becomes critical. Prior work shows that semantically relevant yet misleading documents degrade performance, but the quantitative relationship between the proportion of distractors and performance remains unstudied. In this work, we systematically vary the hard-distractor proportion in fixed-length contexts, revealing a striking nonlinear pattern: as the proportion of hard distractors increases, performance drops sharply within the first small fraction, while the remainder of the range yields only marginal additional decline. We term this ''The First Drop of Ink'' effect, analogous to how a single drop of ink contaminates water. Our theoretical and empirical analyses grounded in attention mechanics show that hard distractors capture disproportionate attention even at small proportions, with diminishing marginal impact as their proportion grows. Controlled experiments further show that filtering gains mainly come from context-length reduction rather than distractor removal; substantial recovery requires reducing the hard-distractor proportion to near zero, highlighting the importance of upstream retrieval precision.

顶级标签: llm model evaluation
详细标签: long-context reasoning misleading information distractor proportion attention mechanics retrieval-augmented generation 或 搜索:

第一滴墨:误导信息在长上下文推理中的非线性影响 / The First Drop of Ink: Nonlinear Impact of Misleading Information in Long-Context Reasoning


1️⃣ 一句话总结

这项研究发现,在长文本推理中,即使只混入少量的误导性信息(类似第一滴墨入水),也会导致模型性能急剧下降,而后续增加更多误导信息影响反而变小,这说明提高上游检索精度远比从大量文本中移除干扰更重要。

源自 arXiv: 2605.10828