菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-09
📄 Abstract - The Conundrum of Trustworthy Research on Attacking Personally Identifiable Information Removal Techniques

Removing personally identifiable information (PII) from texts is necessary to comply with various data protection regulations and to enable data sharing without compromising privacy. However, recent works show that documents sanitized by PII removal techniques are vulnerable to reconstruction attacks. Yet, we suspect that the reported success of these attacks is largely overestimated. We critically analyze the evaluation of existing attacks and find that data leakage and data contamination are not properly mitigated, leaving the question whether or not PII removal techniques truly protect privacy in real-world scenarios unaddressed. We investigate possible data sources and attack setups that avoid data leakage and conclude that only truly private data can allow us to objectively evaluate vulnerabilities in PII removal techniques. However, access to private data is heavily restricted - and for good reasons - which also means that the public research community cannot address this problem in a transparent, reproducible, and trustworthy manner.

顶级标签: natural language processing data model evaluation
详细标签: privacy personally identifiable information data leakage reconstruction attacks evaluation methodology 或 搜索:

攻击个人身份信息脱敏技术研究的可信度困境 / The Conundrum of Trustworthy Research on Attacking Personally Identifiable Information Removal Techniques


1️⃣ 一句话总结

这篇论文指出,当前关于攻击个人身份信息脱敏技术的研究存在严重的数据泄露和污染问题,夸大了攻击成功率,并认为只有使用真正的私人数据才能客观评估脱敏技术的安全性,但由于隐私限制,公开研究难以透明、可复现地解决这一困境。

源自 arXiv: 2603.08207