目标导向的下游无关攻击 / Targeted Downstream-Agnostic Attack
1️⃣ 一句话总结
本文提出一种新的攻击方法,能够在不知道具体下游任务的情况下,让预训练编码器对任意输入图片提取出与攻击者预设的“威胁图像”完全相同的特征,从而实现精准的、有目标的攻击,并揭示了现有自监督模型在这种新威胁下的严重安全隐患。
Recently, pre-trained encoders have gained widespread use due to their strong capability in representation extraction. However, they are vulnerable to downstream-agnostic attacks (DAAs). Existing DAA methods operate under a permissive threat model, where an attack is successful if the generated downstream-agnostic adversarial examples (DAEs) change the original prediction, without requiring a specific target. In this paper, we propose a Targeted DAA (TDAA) method under a stricter threat model requiring the attack to be both targeted and downstream-agnostic. Since the downstream task is unknown and encoders do not directly produce predictions, achieving a targeted attack is particularly challenging. To address this, we introduce a novel component termed the 'threat image', pre-selected by the attacker as the target. Specifically, a generator is designed to produce example-specific adversarial perturbations that compel the victim encoder to output identical features for both the DAEs and the threat image. Unlike previous DAA methods that generate a single shared perturbation for all samples, which often fails due to image diversity, our method adopts an example-specific paradigm. This generates tailored perturbations for each image to ensure a high attack success rate and invisibility. By leveraging the threat image as a feature-level anchor, our method builds a task-agnostic bridge to reveal the vulnerabilities of the victim encoder. Extensive experiments on 10 self-supervised methods across 3 benchmark datasets demonstrate the effectiveness of our approach and reveal the pronounced vulnerability of pre-trained encoders. The code will be made publicly available after the review period.
目标导向的下游无关攻击 / Targeted Downstream-Agnostic Attack
本文提出一种新的攻击方法,能够在不知道具体下游任务的情况下,让预训练编码器对任意输入图片提取出与攻击者预设的“威胁图像”完全相同的特征,从而实现精准的、有目标的攻击,并揭示了现有自监督模型在这种新威胁下的严重安全隐患。
源自 arXiv: 2605.19446