菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-26
📄 Abstract - Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models

Vision-language models (VLMs) achieve remarkable performance but remain vulnerable to adversarial attacks. Entropy, a measure of model uncertainty, is strongly correlated with the reliability of VLM. Prior entropy-based attacks maximize uncertainty at all decoding steps, implicitly assuming that every token contributes equally to generation instability. We show instead that a small fraction (about 20%) of high-entropy tokens, i.e., critical decision points in autoregressive generation, disproportionately governs output trajectories. By concentrating adversarial perturbations on these positions, we achieve semantic degradation comparable to global methods while using substantially smaller budgets. More importantly, across multiple representative VLMs, such selective attacks convert 35-49% of benign outputs into harmful ones, exposing a more critical safety risk. Remarkably, these vulnerable high-entropy forks recur across architecturally diverse VLMs, enabling feasible transferability (17-26% harmful rates on unseen targets). Motivated by these findings, we propose Entropy-bank Guided Adversarial attacks (EGA), which achieves competitive attack success rates (93-95%) alongside high harmful conversion, thereby revealing new weaknesses in current VLM safety mechanisms.

顶级标签: multi-modal model evaluation machine learning
详细标签: adversarial attacks vision-language models entropy model safety autoregressive generation 或 搜索:

少数关键令牌决定成败:基于熵的视觉-语言模型攻击方法 / Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models


1️⃣ 一句话总结

这篇论文发现,视觉-语言模型在生成文本时,只有大约20%的关键位置(高熵令牌)对输出结果起决定性作用,通过集中攻击这些位置,就能用很小的代价让模型产生大量有害内容,从而揭示了现有模型安全机制的重大漏洞。

源自 arXiv: 2512.21815