📄
Abstract - When Confidence Takes the Wrong Path: Diagnosing Retrieval-State Lock-In in RAG
The trustworthiness of a retrieval-augmented generation (RAG) system depends on more than the answer it returns, yet many black-box uncertainty methods still read agreement among sampled answers as confidence. That inference fails when repeated samples condition on the same defective retrieval state. The state may be empty, with the model falling back on parametric memory, or populated by a coherent but wrong neighbourhood. In either case, the answers agree because the error is stable. The problem is recognised in deployed RAG, but it has lacked a name, a measurable signature, and a prevalence bound. We supply all three. We name the failure retrieval-state lock-in and diagnose it by separating the three objects a single confidence score conflates: the answer surface, the retrieved evidence, and the retrieval state itself. In an inspectable, ontology-guided knowledge-graph RAG (KG-RAG) system across six question-answering snapshots, we measure the agreement blind spot directly: at five samples per question, 42% of KG-RAG errors and 59% of dense-retrieval errors carry zero answer dispersion, so agreement has nothing to rank, while evidence- and retrieval-state checks still flag most of them. The decomposition supports an auditable decision rule: accepting an answer only when answer, evidence, and retrieval checks all agree that it is low-risk reaches 91.9% pooled precision against a 69.7% accept-all rate. The cost is coverage: it certifies only 7.7% of answers as low-risk. On the clinical calibration domain it reaches 100% precision under an automated judge; this is an in-domain automated-label upper bound, not a clinical safety claim, and still needs human validation. Confidence in RAG is object-specific: when answers agree, the useful question is which part of the pipeline to distrust.
当信心走错路:诊断检索增强生成中的检索状态锁定 /
When Confidence Takes the Wrong Path: Diagnosing Retrieval-State Lock-In in RAG
1️⃣ 一句话总结
本文揭示并命名了检索增强生成(RAG)系统中的“检索状态锁定”问题——即当模型反复使用相同的有缺陷检索结果时,生成的答案虽然一致但却是错误的,传统置信度方法会因此被误导;作者通过分离答案、证据和检索状态三个层次进行诊断,并提出一种可审计的决策规则,在牺牲部分覆盖率的前提下显著提升了可靠性。