通过遗忘进行攻击:针对图神经网络的反遗忘诱导对抗攻击 / Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks
1️⃣ 一句话总结
这篇论文揭示了一种针对图神经网络的新型隐蔽攻击方法,即攻击者通过先向训练图中注入精心设计的节点,再依法请求删除它们,利用模型在‘遗忘’这些数据时产生的性能漏洞,导致模型在应用遗忘后准确率大幅下降,从而对现有隐私法规下的模型鲁棒性提出了严峻挑战。
Graph neural networks (GNNs) are widely used for learning from graph-structured data in domains such as social networks, recommender systems, and financial platforms. To comply with privacy regulations like the GDPR, CCPA, and PIPEDA, approximate graph unlearning, which aims to remove the influence of specific data points from trained models without full retraining, has become an increasingly important component of trustworthy graph learning. However, approximate unlearning often incurs subtle performance degradation, which may incur negative and unintended side effects. In this work, we show that such degradations can be amplified into adversarial attacks. We introduce the notion of \textbf{unlearning corruption attacks}, where an adversary injects carefully chosen nodes into the training graph and later requests their deletion. Because deletion requests are legally mandated and cannot be denied, this attack surface is both unavoidable and stealthy: the model performs normally during training, but accuracy collapses only after unlearning is applied. Technically, we formulate this attack as a bi-level optimization problem: to overcome the challenges of black-box unlearning and label scarcity, we approximate the unlearning process via gradient-based updates and employ a surrogate model to generate pseudo-labels for the optimization. Extensive experiments across benchmarks and unlearning algorithms demonstrate that small, carefully designed unlearning requests can induce significant accuracy degradation, raising urgent concerns about the robustness of GNN unlearning under real-world regulatory demands. The source code will be released upon paper acceptance.
通过遗忘进行攻击:针对图神经网络的反遗忘诱导对抗攻击 / Attack by Unlearning: Unlearning-Induced Adversarial Attacks on Graph Neural Networks
这篇论文揭示了一种针对图神经网络的新型隐蔽攻击方法,即攻击者通过先向训练图中注入精心设计的节点,再依法请求删除它们,利用模型在‘遗忘’这些数据时产生的性能漏洞,导致模型在应用遗忘后准确率大幅下降,从而对现有隐私法规下的模型鲁棒性提出了严峻挑战。
源自 arXiv: 2603.18570