📄
Abstract - Disentangling Ambiguity from Instability in Large Language Models: A Clinical Text-to-SQL Case Study
Deploying large language models for clinical Text-to-SQL requires distinguishing two qualitatively different causes of output diversity: (i) input ambiguity that should trigger clarification, and (ii) model instability that should trigger human review. We propose CLUES, a framework that models Text-to-SQL as a two-stage process (interpretations --> answers) and decomposes semantic uncertainty into an ambiguity score and an instability score. The instability score is computed via the Schur complement of a bipartite semantic graph matrix. Across AmbigQA/SituatedQA (gold interpretations) and a clinical Text-to-SQL benchmark (known interpretations), CLUES improves failure prediction over state-of-the-art Kernel Language Entropy. In deployment settings, it remains competitive while providing a diagnostic decomposition unavailable from a single score. The resulting uncertainty regimes map to targeted interventions - query refinement for ambiguity, model improvement for instability. The high-ambiguity/high-instability regime contains 51% of errors while covering 25% of queries, enabling efficient triage.
解耦大语言模型中的歧义性与不稳定性:一项临床文本转SQL的案例研究 /
Disentangling Ambiguity from Instability in Large Language Models: A Clinical Text-to-SQL Case Study
1️⃣ 一句话总结
这篇论文提出了一个名为CLUES的框架,能够将大语言模型在临床文本转SQL任务中产生不同答案的原因区分为“输入本身有歧义”和“模型自身不稳定”两类,并分别给出量化分数,从而指导人们采取更精准的应对措施(如澄清问题或审查模型),有效提升了错误预测和处理的效率。