菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-19
📄 Abstract - Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering

Uncertainty Quantification (UQ) is widely regarded as the primary safeguard for deploying Large Language Models (LLMs) in high-stakes domains. However, we argue that the field suffers from a category error: mainstream UQ methods for LLMs are just unsupervised clustering algorithms. We demonstrate that most current approaches inherently quantify the internal consistency of the model's generations rather than their external correctness. Consequently, current methods are fundamentally blind to factual reality and fail to detect ``confident hallucinations,'' where models exhibit high confidence in stable but incorrect answers. Therefore, the current UQ methods may create a deceptive sense of safety when deploying the models with uncertainty. In detail, we identify three critical pathologies resulting from this dependence on internal state: a hyperparameter sensitivity crisis that renders deployment unsafe, an internal evaluation cycle that conflates stability with truth, and a fundamental lack of ground truth that forces reliance on unstable proxy metrics to evaluate uncertainty. To resolve this impasse, we advocate for a paradigm shift to UQ and outline a roadmap for the research community to adopt better evaluation metrics and settings, implement mechanism changes for native uncertainty, and anchor verification in objective truth, ensuring that model confidence serves as a reliable proxy for reality.

顶级标签: llm model evaluation
详细标签: uncertainty quantification hallucination clustering evaluation metrics 或 搜索:

观点:大模型中的不确定性量化本质上就是无监督聚类 / Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering


1️⃣ 一句话总结

本文指出当前大语言模型的不确定性量化方法存在根本性缺陷:这些方法实际上只是在测量模型生成内容的内在一致性,而非答案的正确性,因此无法检测出模型自信地输出错误答案的情形,可能造成虚假的安全感,并呼吁研究界转向以客观事实为基准的新范式。

源自 arXiv: 2605.19220