← 返回列表

arXiv 提交日期: 2026-01-20

📄 Abstract - The Side Effects of Being Smart: Safety Risks in MLLMs' Multi-Image Reasoning

As Multimodal Large Language Models (MLLMs) acquire stronger reasoning capabilities to handle complex, multi-image instructions, this advancement may pose new safety risks. We study this problem by introducing MIR-SafetyBench, the first benchmark focused on multi-image reasoning safety, which consists of 2,676 instances across a taxonomy of 9 multi-image relations. Our extensive evaluations on 19 MLLMs reveal a troubling trend: models with more advanced multi-image reasoning can be more vulnerable on MIR-SafetyBench. Beyond attack success rates, we find that many responses labeled as safe are superficial, often driven by misunderstanding or evasive, non-committal replies. We further observe that unsafe generations exhibit lower attention entropy than safe ones on average. This internal signature suggests a possible risk that models may over-focus on task solving while neglecting safety constraints. Our code and data are available at this https URL.

顶级标签: multi-modal model evaluation llm

聪明的副作用：多模态大语言模型在多图推理中的安全风险 / The Side Effects of Being Smart: Safety Risks in MLLMs' Multi-Image Reasoning

1️⃣ 一句话总结

这篇论文发现，随着多模态大语言模型处理多图推理的能力越强，它们反而更容易产生安全漏洞，因为模型可能过度专注于解题而忽视了安全约束。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2601.14127

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要