📄
Abstract - StructBreak: Structural Cognitive Overload-Induced Safety Failures in MLLMs
Multimodal Large Language Models (MLLMs) excel at structural reasoning yet suffer from a sharp logical brittleness in structural consistency. We term this phenomenon Structural Cognitive Overload (SCO), a byproduct of the contention between deep reasoning and safety alignment. However, prior work has predominantly targeted typographic and pixel-level perturbations, leaving the study of SCO largely unexplored. To this end, we propose StructBreak, an automated end-to-end framework designed to quantify SCO. By leveraging StructBreak, we uncover a novel higher-order cognitive overload attack paradigm; notably, this attack operates under a practical black-box setting, requiring no internal model access. Consequently, we utilize this framework to establish a comprehensive benchmark spanning ten diverse threat scenarios. Empirical evaluations on six leading MLLMs reveal that SCO readily triggers toxic generation, yielding a 92% average ASR (up to 97% on Gemini 2.5). To elucidate the mechanism of SCO, we further conduct model-level interpretations spanning attention dynamics, latent space topology, and geometric analysis. Our findings reveal that StructBreak acts as a novel structural channel to circumvent safety filters. Furthermore, the limited efficacy of inherent safety mechanisms underscores that current alignment paradigms are insufficient for the era of complex multimodal reasoning.
StructBreak:多模态大模型中的结构性认知过载导致的安全失效 /
StructBreak: Structural Cognitive Overload-Induced Safety Failures in MLLMs
1️⃣ 一句话总结
本文发现多模态大模型在处理复杂结构信息时,会因为深度推理与安全对齐之间的冲突而产生一种名为“结构性认知过载”的现象,并据此开发出StructBreak框架——一种能在黑盒条件下自动生成攻击策略的方法,实验表明该方法可使模型生成有害内容的成功率高达97%,揭示了现有安全机制在复杂多模态推理场景中的严重不足。