📄
Abstract - VPA-Guard: Defending and Benchmarking Image-to-Video Generation Against Visual Prompt Attacks
Recent advancements in Image-to-Video (I2V) generation have transformed input images from simple appearance references into interactive control interfaces where visual cues such as arrows, sketches, and emojis orchestrate complex video dynamics with unprecedented controllability. However, these seemingly innocuous static cues can be interpreted by models as executable temporal instructions, unfolding into harmful actions in the generated videos. Despite the severity of this threat, existing safety benchmarks remain predominantly focused on text-based and content-only image-based jailbreaks, leaving implicit visual prompt attacks insufficiently explored. To bridge this gap, we present VVA-Bench, the first systematic benchmark for evaluating video generation safety under categorized vision-centric prompt attacks. Extensive experiments on VVA-Bench demonstrate that state-of-the-art models are highly susceptible to such attacks, with Attack Success Rates (ASR) reaching 100.0\% on Wan 2.7 and 74.8\% on Veo 3.1. To mitigate these risks, we propose VPA-Guard, a retrieval-augmented and self-evolving defense framework. By leveraging few-shot reasoning to identify latent malicious intents, our method reduces the attack ASR by 44.2\% and the harmfulness score by 73.4\% on average, while maintaining the model's utility for legitimate user edits. Our work provides both a rigorous benchmark and an effective defense strategy to advance safe and socially responsible multimodal generation.
VPA-Guard:针对视觉提示攻击的图像到视频生成的防御与基准测试 /
VPA-Guard: Defending and Benchmarking Image-to-Video Generation Against Visual Prompt Attacks
1️⃣ 一句话总结
本文针对图像转视频技术中,看似无害的图像标记(如箭头、涂鸦)可能被模型误解为危险指令并生成不良视频的问题,提出了首个专门评估此类攻击的基准测试平台VVA-Bench,并设计了一种名为VPA-Guard的自进化防御框架,能大幅降低攻击成功率,同时不影响模型处理正常编辑请求的能力。