菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-09
📄 Abstract - Retrieval Pivot Attacks in Hybrid RAG: Measuring and Mitigating Amplified Leakage from Vector Seeds to Graph Expansion

Hybrid Retrieval-Augmented Generation (RAG) pipelines combine vector similarity search with knowledge graph expansion for multi-hop reasoning. We show that this composition introduces a distinct security failure mode: a vector-retrieved "seed" chunk can pivot via entity links into sensitive graph neighborhoods, causing cross-tenant data leakage that does not occur in vector-only retrieval. We formalize this risk as Retrieval Pivot Risk (RPR) and introduce companion metrics Leakage@k, Amplification Factor, and Pivot Depth (PD) to quantify leakage magnitude and traversal structure. We present seven Retrieval Pivot Attacks that exploit the vector-to-graph boundary and show that adversarial injection is not required: naturally shared entities create cross-tenant pivot paths organically. Across a synthetic multi-tenant enterprise corpus and the Enron email corpus, the undefended hybrid pipeline exhibits high pivot risk (RPR up to 0.95) with multiple unauthorized items returned per query. Leakage consistently appears at PD=2, which we attribute to the bipartite chunk-entity topology and formalize as a proposition. We then show that enforcing authorization at a single location, the graph expansion boundary, eliminates measured leakage (RPR near 0) across both corpora, all attack variants, and label forgery rates up to 10 percent, with minimal overhead. Our results indicate the root cause is boundary enforcement, not inherently complex defenses: two individually secure retrieval components can compose into an insecure system unless authorization is re-checked at the transition point.

顶级标签: systems natural language processing llm
详细标签: retrieval-augmented generation security knowledge graphs data leakage multi-hop reasoning 或 搜索:

混合检索增强生成中的检索枢纽攻击:衡量与缓解从向量种子到图扩展的放大泄漏风险 / Retrieval Pivot Attacks in Hybrid RAG: Measuring and Mitigating Amplified Leakage from Vector Seeds to Graph Expansion


1️⃣ 一句话总结

这篇论文发现,将向量搜索和知识图谱结合使用的混合检索增强生成系统存在一种新的安全漏洞——通过向量检索到的普通信息片段,可以像‘枢纽’一样,顺着知识图谱中的实体链接,意外地访问到其他用户的敏感数据,而作者提出的简单解决方案(在图扩展边界处重新进行权限检查)就能有效堵住这个漏洞。

源自 arXiv: 2602.08668