菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-09
📄 Abstract - Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency

As Large Language Models (LLMs) are increasingly deployed in real-world settings, correctness alone is insufficient. Reliable deployment requires maintaining truthful beliefs under contextual perturbations. Existing evaluations largely rely on point-wise confidence like Self-Consistency, which can mask brittle belief. We show that even facts answered with perfect self-consistency can rapidly collapse under mild contextual interference. To address this gap, we propose Neighbor-Consistency Belief (NCB), a structural measure of belief robustness that evaluates response coherence across a conceptual neighborhood. To validate the efficiency of NCB, we introduce a new cognitive stress-testing protocol that probes outputs stability under contextual interference. Experiments across multiple LLMs show that the performance of high-NCB data is relatively more resistant to interference. Finally, we present Structure-Aware Training (SAT), which optimizes context-invariant belief structure and reduces long-tail knowledge brittleness by approximately 30%. Code will be available at this https URL.

顶级标签: llm model evaluation natural language processing
详细标签: truthfulness robustness consistency evaluation knowledge brittleness 或 搜索:

自信的幻觉?通过邻域一致性诊断大语言模型的真实性 / Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency


1️⃣ 一句话总结

这篇论文发现大语言模型对事实的‘自信’回答可能很脆弱,并提出了一种通过检测模型在相关概念扰动下回答是否一致的新方法(邻域一致性信念)来评估和提升其信念的稳健性,最终通过结构感知训练显著减少了知识错误。

源自 arXiv: 2601.05905