菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-19
📄 Abstract - BadCLIP++: Stealthy and Persistent Backdoors in Multimodal Contrastive Learning

Research on backdoor attacks against multimodal contrastive learning models faces two key challenges: stealthiness and persistence. Existing methods often fail under strong detection or continuous fine-tuning, largely due to (1) cross-modal inconsistency that exposes trigger patterns and (2) gradient dilution at low poisoning rates that accelerates backdoor forgetting. These coupled causes remain insufficiently modeled and addressed. We propose BadCLIP++, a unified framework that tackles both challenges. For stealthiness, we introduce a semantic-fusion QR micro-trigger that embeds imperceptible patterns near task-relevant regions, preserving clean-data statistics while producing compact trigger distributions. We further apply target-aligned subset selection to strengthen signals at low injection rates. For persistence, we stabilize trigger embeddings via radius shrinkage and centroid alignment, and stabilize model parameters through curvature control and elastic weight consolidation, maintaining solutions within a low-curvature wide basin resistant to fine-tuning. We also provide the first theoretical analysis showing that, within a trust region, gradients from clean fine-tuning and backdoor objectives are co-directional, yielding a non-increasing upper bound on attack success degradation. Experiments demonstrate that with only 0.3% poisoning, BadCLIP++ achieves 99.99% attack success rate (ASR) in digital settings, surpassing baselines by 11.4 points. Across nineteen defenses, ASR remains above 99.90% with less than 0.8% drop in clean accuracy. The method further attains 65.03% success in physical attacks and shows robustness against watermark removal defenses.

顶级标签: multi-modal model training model evaluation
详细标签: backdoor attack multimodal contrastive learning adversarial robustness stealthy triggers model security 或 搜索:

BadCLIP++:多模态对比学习中的隐蔽且持久的后门攻击 / BadCLIP++: Stealthy and Persistent Backdoors in Multimodal Contrastive Learning


1️⃣ 一句话总结

这篇论文提出了一种名为BadCLIP++的新型后门攻击方法,它通过设计一种极难察觉的微小触发图案,并结合模型参数稳定技术,使得植入多模态AI模型中的恶意后门在极低的数据污染率下,既能有效躲避现有检测,又能在模型后续的微调更新中长久存活,从而实现对模型的隐蔽且持久的控制。

源自 arXiv: 2602.17168