📄
Abstract - Benign Overfitting in Adversarial Training for Vision Transformers
Despite the remarkable success of Vision Transformers (ViTs) across a wide range of vision tasks, recent studies have revealed that they remain vulnerable to adversarial examples, much like Convolutional Neural Networks (CNNs). A common empirical defense strategy is adversarial training, yet the theoretical underpinnings of its robustness in ViTs remain largely unexplored. In this work, we present the first theoretical analysis of adversarial training under simplified ViT architectures. We show that, when trained under a signal-to-noise ratio that satisfies a certain condition and within a moderate perturbation budget, adversarial training enables ViTs to achieve nearly zero robust training loss and robust generalization error under certain regimes. Remarkably, this leads to strong generalization even in the presence of overfitting, a phenomenon known as \emph{benign overfitting}, previously only observed in CNNs (with adversarial training). Experiments on both synthetic and real-world datasets further validate our theoretical findings.
对抗训练中视觉Transformer的良性过拟合 /
Benign Overfitting in Adversarial Training for Vision Transformers
1️⃣ 一句话总结
本文首次从理论上证明了,在特定信号噪声比和适度扰动强度下,视觉Transformer(ViT)通过对抗训练不仅能实现近乎零的鲁棒训练损失和良好的泛化能力,而且即使模型出现过度拟合,这种过拟合反而是“良性”的——不会损害模型对对抗样本的防御效果,从而将之前仅在卷积神经网络(CNN)中观察到的现象推广到了ViT中。