RAG-X: Systematic Diagnosis of Retrieval-Augmented Generation for Medical Question Answering

📄 Abstract - RAG-X: Systematic Diagnosis of Retrieval-Augmented Generation for Medical Question Answering

Automated question-answering (QA) systems increasingly rely on retrieval-augmented generation (RAG) to ground large language models (LLMs) in authoritative medical knowledge, ensuring clinical accuracy and patient safety in Artificial Intelligence (AI) applications for healthcare. Despite progress in RAG evaluation, current benchmarks focus only on simple multiple-choice QA tasks and employ metrics that poorly capture the semantic precision required for complex QA tasks. These approaches fail to diagnose whether an error stems from faulty retrieval or flawed generation, limiting developers from performing targeted improvement. To address this gap, we propose RAG-X, a diagnostic framework that evaluates the retriever and generator independently across a triad of QA tasks: information extraction, short-answer generation, and multiple-choice question (MCQ) answering. RAG-X introduces Context Utilization Efficiency (CUE) metrics to disaggregate system success into interpretable quadrants, isolating verified grounding from deceptive accuracy. Our experiments reveal an ``Accuracy Fallacy", where a 14\% gap separates perceived system success from evidence-based grounding. By surfacing hidden failure modes, RAG-X offers the diagnostic transparency needed for safe and verifiable clinical RAG systems.

RAG-X：面向医学问答的检索增强生成系统化诊断框架 / RAG-X: Systematic Diagnosis of Retrieval-Augmented Generation for Medical Question Answering

1️⃣ 一句话总结

这篇论文提出了一个名为RAG-X的诊断框架，它能够独立评估医学问答系统中检索和生成模块的性能，并通过引入新的度量指标来揭示系统表面准确率与实际证据支持之间的差距，从而帮助开发者更精准地改进系统以确保临床安全。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要