VALD: Multi-Stage Vision Attack Detection for Efficient LVLM Defense

📄 Abstract - VALD: Multi-Stage Vision Attack Detection for Efficient LVLM Defense

Large Vision-Language Models (LVLMs) can be vulnerable to adversarial images that subtly bias their outputs toward plausible yet incorrect responses. We introduce a general, efficient, and training-free defense that combines image transformations with agentic data consolidation to recover correct model behavior. A key component of our approach is a two-stage detection mechanism that quickly filters out the majority of clean inputs. We first assess image consistency under content-preserving transformations at negligible computational cost. For more challenging cases, we examine discrepancies in a text-embedding space. Only when necessary do we invoke a powerful LLM to resolve attack-induced divergences. A key idea is to consolidate multiple responses, leveraging both their similarities and their differences. We show that our method achieves state-of-the-art accuracy while maintaining notable efficiency: most clean images skip costly processing, and even in the presence of numerous adversarial examples, the overhead remains minimal.

VALD：用于高效LVLM防御的多阶段视觉攻击检测 / VALD: Multi-Stage Vision Attack Detection for Efficient LVLM Defense

1️⃣ 一句话总结

这篇论文提出了一种名为VALD的高效防御方法，它通过一个多阶段的检测流程来保护大型视觉语言模型免受对抗性图像攻击，其核心思想是先用低成本操作快速过滤掉大部分正常图像，只在必要时才调用复杂模型进行分析，从而在保证高准确率的同时显著降低了计算开销。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要