菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-02
📄 Abstract - Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

Like a body at rest that stays at rest, we find that visual attention in multimodal large language models (MLLMs) exhibits pronounced inertia, remaining largely static once settled during early decoding steps and failing to support the compositional understanding required for cognitive inference. While existing hallucination mitigation methods mainly target perceptual hallucinations concerning object existence or attributes, they remain inadequate for such cognitive hallucinations that require inter-object relational deduction. Through token-wise attention analysis, we identify this visual inertia as a key factor: attention to semantically critical regions remains persistently focused and fails to dynamically support relational inference. We thereby propose a training-free Inertia-aware Visual Excitation (IVE) method that breaks this inertial pattern by modeling cognitive inference as the dynamic responsiveness of visual attention. Specifically, IVE selects visual tokens that are dynamically emerging relative to historical attention trends while distinguishing tokens exhibiting inertial behavior. To further facilitate compositional inference, IVE introduces an inertia-aware penalty that discourages over-concentration and limits the persistence of attention within localized regions. Extensive experiments show that IVE is effective across various base MLLMs and multiple hallucination benchmarks, particularly for cognitive hallucinations.

顶级标签: multi-modal model evaluation natural language processing
详细标签: visual attention hallucination mitigation multimodal llms cognitive inference attention analysis 或 搜索:

静止的注意力保持静止:打破视觉惯性以缓解认知幻觉 / Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation


1️⃣ 一句话总结

这篇论文发现多模态大语言模型中的视觉注意力存在‘惯性’问题,即注意力一旦在解码初期固定就难以动态转移,导致模型难以进行物体间关系推理而产生‘认知幻觉’,并提出了一种无需训练的方法来打破这种惯性,有效缓解了此类幻觉。

源自 arXiv: 2604.01989