纠正性检索增强生成的开源复现与可解释性分析 / Open-Source Reproduction and Explainability Analysis of Corrective Retrieval Augmented Generation
1️⃣ 一句话总结
这篇论文开源复现了CRAG系统,用可替代组件实现了与原系统相当的性能,并通过可解释性分析首次揭示了其检索评估器主要依赖实体匹配而非语义理解,同时指出了其在科学问题上的局限性。
Corrective Retrieval Augmented Generation (CRAG) improves the robustness of RAG systems by evaluating retrieved document quality and triggering corrective actions. However, the original implementation relies on proprietary components including the Google Search API and closed model weights, limiting reproducibility. In this work, we present a fully open-source reproduction of CRAG, replacing proprietary web search with the Wikipedia API and the original LLaMA-2 generator with Phi-3-mini-4k-instruct. We evaluate on PopQA and ARC-Challenge, demonstrating that our open-source pipeline achieves comparable performance to the original system. Furthermore, we contribute the first explainability analysis of CRAG's T5-based retrieval evaluator using SHAP, revealing that the evaluator primarily relies on named entity alignment rather than semantic similarity. Our analysis identifies key failure modes including domain transfer limitations on science questions. All code and results are available at this https URL.
纠正性检索增强生成的开源复现与可解释性分析 / Open-Source Reproduction and Explainability Analysis of Corrective Retrieval Augmented Generation
这篇论文开源复现了CRAG系统,用可替代组件实现了与原系统相当的性能,并通过可解释性分析首次揭示了其检索评估器主要依赖实体匹配而非语义理解,同时指出了其在科学问题上的局限性。
源自 arXiv: 2603.16169