菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-13
📄 Abstract - GazeVaLM: A Multi-Observer Eye-Tracking Benchmark for Evaluating Clinical Realism in AI-Generated X-Rays

We introduce GazeVaLM, a public eye-tracking dataset for studying clinical perception during chest radiograph authenticity assessment. The dataset comprises 960 gaze recordings from 16 expert radiologists interpreting 30 real and 30 synthetic chest X-rays (generated by diffusion based generative AI) under two conditions: diagnostic assessment and real-fake classification (Visual Turing test). For each image-observer pair, we provide raw gaze samples, fixation maps, scanpaths, saliency density maps, structured diagnostic labels, and authenticity judgments. We extend the protocol to 6 state-of-the-art multimodal LLMs, releasing their predicted diagnoses, authenticity labels, and confidence scores under matched conditions - enabling direct human-AI comparison at both decision and uncertainty levels. We further provide analyses of gaze agreement, inter-observer consistency, and benchmarking of radiologists versus LLMs in diagnostic accuracy and authenticity detection. GazeVaLM supports research in gaze modeling, clinical decision-making, human-AI comparison, generative image realism assessment, and uncertainty quantification. By jointly releasing visual attention data, clinical labels, and model predictions, we aim to facilitate reproducible research on how experts and AI systems perceive, interpret, and evaluate medical images. The dataset is available at this https URL.

顶级标签: medical multi-modal model evaluation
详细标签: eye-tracking chest x-rays human-ai comparison generative image realism clinical perception 或 搜索:

GazeVaLM:一个用于评估AI生成X射线临床真实性的多观察者眼动追踪基准 / GazeVaLM: A Multi-Observer Eye-Tracking Benchmark for Evaluating Clinical Realism in AI-Generated X-Rays


1️⃣ 一句话总结

这篇论文发布了一个名为GazeVaLM的公开眼动追踪数据集,它记录了放射科专家在判断真实与AI生成胸片时的视觉注意力和诊断决策,并首次将多位专家的数据与多模态大语言模型的表现进行对比,为研究AI生成医疗图像的临床真实性和人机决策差异提供了重要基准。

源自 arXiv: 2604.11653