差分隐私检索增强生成 / Differentially Private Retrieval-Augmented Generation
1️⃣ 一句话总结
这篇论文提出了一种名为DP-KSA的新算法,通过差分隐私技术保护检索增强生成(RAG)系统中的敏感数据,在确保用户隐私的同时,有效减少了大型语言模型在专业任务中产生错误信息的风险。
Retrieval-augmented generation (RAG) is a widely used framework for reducing hallucinations in large language models (LLMs) on domain-specific tasks by retrieving relevant documents from a database to support accurate responses. However, when the database contains sensitive corpora, such as medical records or legal documents, RAG poses serious privacy risks by potentially exposing private information through its outputs. Prior work has demonstrated that one can practically craft adversarial prompts that force an LLM to regurgitate the augmented contexts. A promising direction is to integrate differential privacy (DP), a privacy notion that offers strong formal guarantees, into RAG systems. However, naively applying DP mechanisms into existing systems often leads to significant utility degradation. Particularly for RAG systems, DP can reduce the usefulness of the augmented contexts leading to increase risk of hallucination from the LLMs. Motivated by these challenges, we present DP-KSA, a novel privacy-preserving RAG algorithm that integrates DP using the propose-test-release paradigm. DP-KSA follows from a key observation that most question-answering (QA) queries can be sufficiently answered with a few keywords. Hence, DP-KSA first obtains an ensemble of relevant contexts, each of which will be used to generate a response from an LLM. We utilize these responses to obtain the most frequent keywords in a differentially private manner. Lastly, the keywords are augmented into the prompt for the final output. This approach effectively compresses the semantic space while preserving both utility and privacy. We formally show that DP-KSA provides formal DP guarantees on the generated output with respect to the RAG database. We evaluate DP-KSA on two QA benchmarks using three instruction-tuned LLMs, and our empirical results demonstrate that DP-KSA achieves a strong privacy-utility tradeoff.
差分隐私检索增强生成 / Differentially Private Retrieval-Augmented Generation
这篇论文提出了一种名为DP-KSA的新算法,通过差分隐私技术保护检索增强生成(RAG)系统中的敏感数据,在确保用户隐私的同时,有效减少了大型语言模型在专业任务中产生错误信息的风险。
源自 arXiv: 2602.14374