菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-11
📄 Abstract - Authority, Truth, and Citation Bias: A Large-Scale Multi-Domain Benchmark for Studying Epistemic Susceptibility in Large Language Models

Large language models are increasingly deployed in citation-augmented settings, yet the effect of citation presence on model behavior independent of factual content remains poorly understood. We introduce AuthorityBench, a 220,564-prompt multi-domain benchmark that isolates how citation-based authority signals influence epistemic behavior in LLMs. The benchmark uses a fully balanced 2x2 factorial design crossing claim veracity with citation veracity, the first to do so, across four domains (general knowledge, science, law, and medicine), with controlled variation over 40 prompt templates, four venue prestige tiers, and a country-coded author name dataset. Evaluating seven models on 12 structured research questions, we find that citation presence, whether real or fabricated, consistently increases hallucination rates relative to a no-citation baseline. The effect is strongest when fabricated citations accompany true claims, raising hallucination rates by 3 to 22 percentage points and reaching 35 to 77% in the general knowledge domain, while legal claims are comparatively robust and venue prestige and author demographics show negligible impact. All datasets and evaluation code are available at: this https URL

顶级标签: llm benchmark model evaluation
详细标签: citation bias hallucination epistemic susceptibility authority signals multi-domain 或 搜索:

权威、真相与引用偏见:一个用于研究大型语言模型认知易感性的多领域大规模基准测试 / Authority, Truth, and Citation Bias: A Large-Scale Multi-Domain Benchmark for Studying Epistemic Susceptibility in Large Language Models


1️⃣ 一句话总结

这篇论文创建了一个包含22万多个提示的大规模基准测试,通过严格控制的实验发现,大型语言模型在回答问题时,只要看到“引用”存在(无论引用是否真实),其产生错误信息的概率都会显著增加,甚至高达77%,其中虚假引用搭配真实信息的影响最大,而引用的来源高低或作者背景则几乎没有影响。

源自 arXiv: 2606.13104