← 返回列表

🤖 系统

📄 Abstract - Adversarial Confusion Attack: Disrupting Multimodal Large Language Models

We introduce the Adversarial Confusion Attack, a new class of threats against multimodal large language models (MLLMs). Unlike jailbreaks or targeted misclassification, the goal is to induce systematic disruption that makes the model generate incoherent or confidently incorrect outputs. Practical applications include embedding such adversarial images into websites to prevent MLLM-powered AI Agents from operating reliably. The proposed attack maximizes next-token entropy using a small ensemble of open-source MLLMs. In the white-box setting, we show that a single adversarial image can disrupt all models in the ensemble, both in the full-image and Adversarial CAPTCHA settings. Despite relying on a basic adversarial technique (PGD), the attack generates perturbations that transfer to both unseen open-source (e.g., Qwen3-VL) and proprietary (e.g., GPT-5.1) models.

顶级标签: multi-modal llm model evaluation

对抗性混淆攻击：扰乱多模态大语言模型 / Adversarial Confusion Attack: Disrupting Multimodal Large Language Models

1️⃣ 一句话总结

这篇论文提出了一种名为‘对抗性混淆攻击’的新威胁方法，它通过向图像中添加微小的、人眼难以察觉的干扰，就能让多模态大语言模型（如GPT-5.1）产生混乱或自信的错误回答，从而破坏AI代理的可靠运行。

📄 打开原文 PDF

← 返回列表

菜单

🤖 AI 深度阅读

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

🤖 AI 深度阅读

1️⃣ 一句话总结

获取最新论文摘要