诊断与缓解长时搜索任务中的上下文退化现象 / Diagnosing and Mitigating Context Rot in Long-horizon Search
1️⃣ 一句话总结
本文发现大语言模型在处理长篇幅上下文时会出现‘上下文退化’现象——模型会过早放弃或给出不确定答案,且上下文越长问题越严重;作者通过剪枝实验揭示了上下文积累与退化的关系,并提出了上下文管理和拒绝采样两种缓解策略,其中基于退化感知的过滤方法能有效提升模型表现。
Extensive context has become the norm as Large Language Models (LLMs) are increasingly deployed in long-horizon tasks. The concern that increasing context length degrades model capabilities, known as context rot, has become a central issue for these applications. In this paper, we focus on deep search scenarios, aiming to investigate the rot phenomenon and its mitigation strategies. By evaluating four flagship open-source models across three benchmarks, we reveal a prevalent but unnoticed rot phenomenon: extensive context causes models to directly give up or prematurely provide uncertain answers, and this issue is exacerbated as the context grows. Through pruning experiments, we demonstrate the relationship between the accumulated context and the rot phenomenon. Furthermore, we investigate mitigating this issue through context management and post-hoc rejection sampling. For context management, we systematically evaluate seven different methods across three categories, based on performance, cost, and impact on context rot, providing clear guidance for strategy selection and usage. For rejection sampling, we develop a rot-aware filtering strategy and demonstrate its effectiveness across three aggregation methods. Finally, we show that these two approaches can be combined for further performance improvements.
诊断与缓解长时搜索任务中的上下文退化现象 / Diagnosing and Mitigating Context Rot in Long-horizon Search
本文发现大语言模型在处理长篇幅上下文时会出现‘上下文退化’现象——模型会过早放弃或给出不确定答案,且上下文越长问题越严重;作者通过剪枝实验揭示了上下文积累与退化的关系,并提出了上下文管理和拒绝采样两种缓解策略,其中基于退化感知的过滤方法能有效提升模型表现。
源自 arXiv: 2606.29718