菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-22
📄 Abstract - Have You Ever Seen Them? Entity-level Membership Inference through Interrogating Large Language Models

Large Language Models (LLMs) raise growing concerns about privacy leakage and copyright compliance. Membership inference is a key tool for assessing such risks, but existing studies mainly focus on whether specific samples or sample-based data units are used for training. We argue that LLMs exhibit a human-memory-like behavior: an LLM may not memorize a specific sample verbatim, yet it can accumulate and reveal knowledge about a real-world entity from scattered mentions. This analogy motivates us to examine whether an LLM can be interrogated like a human interviewee to reveal its exposure to entity-related information. Motivated by this question, we propose entity-level membership inference, which determines whether information related to a target entity is used in LLM training. We study this task in the practical label-only black-box setting, where only generated texts are observable. We formalize the task under clue, input, and model constraints, establish the necessary and sufficient conditions for its feasibility, and instantiate five interrogation strategies based on this formalization. The strategies use limited entity clues to construct prompts, elicit entity-related responses, and infer membership from semantic features among the generated texts. We construct entity-level datasets and adapt state-of-the-art sample-level label-only methods to the entity-level setting as baselines. Experiments on person entities show that our methods achieve AUC up to 0.97 and bring gains of 6.0%--17.5% in Balanced Accuracy over the best adapted baseline.

顶级标签: llm privacy
详细标签: membership inference entity-level black-box attack prompt interrogation privacy leakage 或 搜索:

你见过它们吗?通过审问大型语言模型实现实体级别的成员推断 / Have You Ever Seen Them? Entity-level Membership Inference through Interrogating Large Language Models


1️⃣ 一句话总结

本文提出了一种新方法,像审问人类一样通过提问大型语言模型(仅观察其生成文本)来判断某个真实世界实体的信息是否被用于模型训练,在实验中准确率高达0.97,比现有方法提升6%到17.5%。

源自 arXiv: 2606.23030