VALD:用于高效LVLM防御的多阶段视觉攻击检测 / VALD: Multi-Stage Vision Attack Detection for Efficient LVLM Defense
1️⃣ 一句话总结
这篇论文提出了一种名为VALD的高效防御方法,它通过一个多阶段的检测流程来保护大型视觉语言模型免受对抗性图像攻击,其核心思想是先用低成本操作快速过滤掉大部分正常图像,只在必要时才调用复杂模型进行分析,从而在保证高准确率的同时显著降低了计算开销。
Large Vision-Language Models (LVLMs) can be vulnerable to adversarial images that subtly bias their outputs toward plausible yet incorrect responses. We introduce a general, efficient, and training-free defense that combines image transformations with agentic data consolidation to recover correct model behavior. A key component of our approach is a two-stage detection mechanism that quickly filters out the majority of clean inputs. We first assess image consistency under content-preserving transformations at negligible computational cost. For more challenging cases, we examine discrepancies in a text-embedding space. Only when necessary do we invoke a powerful LLM to resolve attack-induced divergences. A key idea is to consolidate multiple responses, leveraging both their similarities and their differences. We show that our method achieves state-of-the-art accuracy while maintaining notable efficiency: most clean images skip costly processing, and even in the presence of numerous adversarial examples, the overhead remains minimal.
VALD:用于高效LVLM防御的多阶段视觉攻击检测 / VALD: Multi-Stage Vision Attack Detection for Efficient LVLM Defense
这篇论文提出了一种名为VALD的高效防御方法,它通过一个多阶段的检测流程来保护大型视觉语言模型免受对抗性图像攻击,其核心思想是先用低成本操作快速过滤掉大部分正常图像,只在必要时才调用复杂模型进行分析,从而在保证高准确率的同时显著降低了计算开销。
源自 arXiv: 2602.19570