面向大语言模型智能体的情境化隐私防御 / Contextualized Privacy Defense for LLM Agents
1️⃣ 一句话总结
这篇论文提出了一种名为‘情境化防御指导’的新方法,通过一个专门的指导模型在智能体执行任务时动态生成情境感知的隐私保护建议,并利用强化学习优化该模型,从而在有效保护用户隐私的同时,让智能体保持较高的任务完成能力。
LLM agents increasingly act on users' personal information, yet existing privacy defenses remain limited in both design and adaptability. Most prior approaches rely on static or passive defenses, such as prompting and guarding. These paradigms are insufficient for supporting contextual, proactive privacy decisions in multi-step agent execution. We propose Contextualized Defense Instructing (CDI), a new privacy defense paradigm in which an instructor model generates step-specific, context-aware privacy guidance during execution, proactively shaping actions rather than merely constraining or vetoing them. Crucially, CDI is paired with an experience-driven optimization framework that trains the instructor via reinforcement learning (RL), where we convert failure trajectories with privacy violations into learning environments. We formalize baseline defenses and CDI as distinct intervention points in a canonical agent loop, and compare their privacy-helpfulness trade-offs within a unified simulation framework. Results show that our CDI consistently achieves a better balance between privacy preservation (94.2%) and helpfulness (80.6%) than baselines, with superior robustness to adversarial conditions and generalization.
面向大语言模型智能体的情境化隐私防御 / Contextualized Privacy Defense for LLM Agents
这篇论文提出了一种名为‘情境化防御指导’的新方法,通过一个专门的指导模型在智能体执行任务时动态生成情境感知的隐私保护建议,并利用强化学习优化该模型,从而在有效保护用户隐私的同时,让智能体保持较高的任务完成能力。
源自 arXiv: 2603.02983