DetailVerifyBench:长图像描述中密集幻觉定位的基准 / DetailVerifyBench: A Benchmark for Dense Hallucination Localization in Long Image Captions
1️⃣ 一句话总结
这篇论文提出了一个名为DetailVerifyBench的新基准测试,专门用于评估AI模型在长达数百字的详细图像描述中,精准找出并定位具体错误词语或片段的能力,以解决当前多模态大模型生成长描述时内容不可靠的难题。
Accurately detecting and localizing hallucinations is a critical task for ensuring high reliability of image captions. In the era of Multimodal Large Language Models (MLLMs), captions have evolved from brief sentences into comprehensive narratives, often spanning hundreds of words. This shift exponentially increases the challenge: models must now pinpoint specific erroneous spans or words within extensive contexts, rather than merely flag response-level inconsistencies. However, existing benchmarks lack the fine granularity and domain diversity required to evaluate this capability. To bridge this gap, we introduce DetailVerifyBench, a rigorous benchmark comprising 1,000 high-quality images across five distinct domains. With an average caption length of over 200 words and dense, token-level annotations of multiple hallucination types, it stands as the most challenging benchmark for precise hallucination localization in the field of long image captioning to date. Our benchmark is available at this https URL.
DetailVerifyBench:长图像描述中密集幻觉定位的基准 / DetailVerifyBench: A Benchmark for Dense Hallucination Localization in Long Image Captions
这篇论文提出了一个名为DetailVerifyBench的新基准测试,专门用于评估AI模型在长达数百字的详细图像描述中,精准找出并定位具体错误词语或片段的能力,以解决当前多模态大模型生成长描述时内容不可靠的难题。
源自 arXiv: 2604.05623