菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-03
📄 Abstract - Rethinking the Reranker: Boundary-Aware Evidence Selection for Robust Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) systems remain brittle under realistic retrieval noise, even when the required evidence appears in the top-K results. A key reason is that retrievers and rerankers optimize solely for relevance, often selecting either trivial, answer-revealing passages or evidence that lacks the critical information required to answer the question, without considering whether the evidence is suitable for the generator. We propose BAR-RAG, which reframes the reranker as a boundary-aware evidence selector that targets the generator's Goldilocks Zone -- evidence that is neither trivially easy nor fundamentally unanswerable for the generator, but is challenging yet sufficient for inference and thus provides the strongest learning signal. BAR-RAG trains the selector with reinforcement learning using generator feedback, and adopts a two-stage pipeline that fine-tunes the generator under the induced evidence distribution to mitigate the distribution mismatch between training and inference. Experiments on knowledge-intensive question answering benchmarks show that BAR-RAG consistently improves end-to-end performance under noisy retrieval, achieving an average gain of 10.3 percent over strong RAG and reranking baselines while substantially improving robustness. Code is publicly avaliable at this https URL.

顶级标签: llm natural language processing model training
详细标签: retrieval-augmented generation evidence selection reinforcement learning robustness question answering 或 搜索:

重新思考重排序器:面向鲁棒检索增强生成的边界感知证据选择 / Rethinking the Reranker: Boundary-Aware Evidence Selection for Robust Retrieval-Augmented Generation


1️⃣ 一句话总结

这篇论文提出了一个名为BAR-RAG的新方法,它通过让重排序器像‘边界感知’的裁判一样,专门为文本生成器挑选‘难度适中’的参考资料,并利用生成器的反馈来训练这个选择器,从而显著提升了检索增强生成系统在面临不完美检索结果时的鲁棒性和最终答案质量。

源自 arXiv: 2602.03689