Auditing Machine Unlearning: A Systematic Research on Whether Models Truly Forget

📄 Abstract - Auditing Machine Unlearning: A Systematic Research on Whether Models Truly Forget

Machine unlearning has been extensively studied in response to growing privacy concerns and regulatory requirements. However, auditing whether unlearning algorithms have truly erased the influence of specific data remains an open challenge. The lack of reliable and practical auditing mechanisms can lead to critical privacy risks, such as residual information leakage. This paper initiates a systematic investigation into whether existing unlearning algorithms can truly forget the designated data. We propose the first practical and general-purpose auditing framework for machine unlearning, inspired by the concept of proof of ignorance. Our framework addresses the key practicality limitations of existing methods by eliminating the need for retraining-from-scratch baselines, avoiding the training of large numbers of shadow models, and requiring no intrusive intervention in the original training process. To evaluate the effectiveness of our framework, we first conduct validation experiments to verify its soundness and completeness. We then perform comprehensive experiments across six datasets and ten representative unlearning methods. The results demonstrate that our framework reliably distinguishes between successful and failed unlearning. In particular, we observe that retraining-based and fine-tuning-based methods can achieve effective unlearning, even when the target data remain in the original dataset. In contrast, de-optimization-based methods fail to achieve true unlearning and instead degrade the model's performance. Fisher/Hessian-based methods also fail to unlearn requested data, even formal certification is provided. Moreover, we show that our framework is robust against fake unlearning attempts and generalizes well to large language models.

审计机器遗忘：关于模型是否真正遗忘的系统性研究 / Auditing Machine Unlearning: A Systematic Research on Whether Models Truly Forget

1️⃣ 一句话总结

本文提出了一种首个实用且通用的机器遗忘审计框架，通过无需重新训练基线、避免训练大量影子模型且不干预原始训练过程的方式，系统验证了现有遗忘算法是否真的彻底删除了指定数据，并发现基于重新训练和微调的方法有效，而基于去优化和Fisher/Hessian的方法会失败甚至损害模型性能。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要