A Classifier-Agnostic Zero-Shot Adversarial Attack Detection via CLIP

📄 Abstract - A Classifier-Agnostic Zero-Shot Adversarial Attack Detection via CLIP

Adversarial attacks pose a challenge to the reliability of deep learning models, motivating effective detection methods. Existing techniques often rely on attack-specific assumptions, access to adversarial samples, or knowledge of the underlying classifier (white-box). We propose \textit{$A^4D$ (\textbf{A}ttack- and \textbf{A}rchitecture-\textbf{A}gnostic \textbf{A}dversarial \textbf{D}etector)}, a completely black-box, zero-shot adversarial attack detection framework that utilizes prompt-based similarity scores derived from CLIP. To the best of our knowledge this is the first attempt to utilize CLIP for such a task. The method is based on two key observations: (i) CLIP is sensitive even to small imperceptible non-semantic perturbations; (ii) The shift in CLIP embedding space is not arbitrary and can be used as a robust attack indicator. Experiments across multiple attacks, datasets and classifiers validate that $A^4D$ achieves SOTA detection results in the attack-agnostic and classifier-agnostic setting.

基于CLIP的无需分类器信息的零样本对抗攻击检测方法 / A Classifier-Agnostic Zero-Shot Adversarial Attack Detection via CLIP

1️⃣ 一句话总结

本文提出一种名为A4D的检测方法，利用CLIP模型对图像扰动的敏感性，在不依赖任何攻击类型或分类器信息的情况下，通过分析嵌入空间的变化来零样本检测对抗攻击，并在多个实验场景中取得了领先效果。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要