菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-30
📄 Abstract - When Choices Become Priors: Contrastive Decoding for Scientific Figure Multiple-Choice QA

Scientific figure multiple-choice question answering (MCQA) requires models to reason over diverse visual evidence, ranging from charts and multipanel figures to microscopy and biomedical images. However, this setting suffers from a distinctive bias: answer choices themselves can act as priors, steering multimodal models toward scientifically plausible options even when the figure supports a different answer. We investigate this failure mode through a simple question: what if decoding explicitly discounts what the model would prefer from text alone, so as to favor figure-grounded evidence? To this end, we propose SCICON, a training-free decoding method that scores each candidate by subtracting a text-only option score from its image-conditioned counterpart. Unlike prior contrastive decoding approaches that mitigate hallucinations by contrasting original inputs with distorted images or perturbed instructions, SCICON directly targets the choice-induced prior encoded in candidate text. Across three scientific figure QA benchmarks and three model backbones, SCICON consistently improves accuracy over standard decoding baselines. These results show that decoding against choice-induced priors is an effective and simple way to improve figure-grounded reasoning in scientific MCQA.

顶级标签: multi-modal model evaluation natural language processing
详细标签: contrastive decoding multimodal reasoning scientific qa multiple-choice visual question answering 或 搜索:

当选项成为先验:用于科学图表多选题问答的对比解码方法 / When Choices Become Priors: Contrastive Decoding for Scientific Figure Multiple-Choice QA


1️⃣ 一句话总结

这篇论文发现,在科学图表多选题中,答案选项本身会成为一种干扰性的‘先验知识’,导致多模态模型忽略图像证据而选择看似合理的答案,为此作者提出了一种无需训练的‘SCICON’解码方法,通过对比模型在有图和无图时对选项的打分差异,来迫使模型更依赖图像进行推理,从而在多个基准测试中有效提升了答题准确率。

源自 arXiv: 2603.28026