规模化检索增强生成与RAG融合:来自工业部署的经验教训 / Scaling Retrieval Augmented Generation with RAG Fusion: Lessons from an Industry Deployment
1️⃣ 一句话总结
这篇论文通过实际工业部署发现,在检索增强生成系统中,单纯追求高召回率的融合检索技术(如多查询检索)并不能有效提升最终答案质量,反而可能增加系统延迟,因此需要更全面的端到端评估框架。
Retrieval-Augmented Generation (RAG) systems commonly adopt retrieval fusion techniques such as multi-query retrieval and reciprocal rank fusion (RRF) to increase document recall, under the assumption that higher recall leads to better answer quality. While these methods show consistent gains in isolated retrieval benchmarks, their effectiveness under realistic production constraints remains underexplored. In this work, we evaluate retrieval fusion in a production-style RAG pipeline operating over an enterprise knowledge base, with fixed retrieval depth, re-ranking budgets, and latency constraints. Across multiple fusion configurations, we find that retrieval fusion does increase raw recall, but these gains are largely neutralized after re-ranking and truncation. In our setting, fusion variants fail to outperform single-query baselines on KB-level Top-$k$ accuracy, with Hit@10 decreasing from $0.51$ to $0.48$ in several configurations. Moreover, fusion introduces additional latency overhead due to query rewriting and larger candidate sets, without corresponding improvements in downstream effectiveness. Our analysis suggests that recall-oriented fusion techniques exhibit diminishing returns once realistic re-ranking limits and context budgets are applied. We conclude that retrieval-level improvements do not reliably translate into end-to-end gains in production RAG systems, and argue for evaluation frameworks that jointly consider retrieval quality, system efficiency, and downstream impact.
规模化检索增强生成与RAG融合:来自工业部署的经验教训 / Scaling Retrieval Augmented Generation with RAG Fusion: Lessons from an Industry Deployment
这篇论文通过实际工业部署发现,在检索增强生成系统中,单纯追求高召回率的融合检索技术(如多查询检索)并不能有效提升最终答案质量,反而可能增加系统延迟,因此需要更全面的端到端评估框架。
源自 arXiv: 2603.02153