ER-MIA:针对长时记忆增强大语言模型的黑盒对抗性记忆注入攻击 / ER-MIA: Black-Box Adversarial Memory Injection Attacks on Long-Term Memory-Augmented Large Language Models
1️⃣ 一句话总结
这篇论文首次系统地研究了针对长时记忆增强大语言模型的攻击方法,揭示了其基于相似性检索的核心机制存在根本性安全漏洞,攻击者可以通过向记忆库中注入精心构造的恶意内容来误导模型输出。
Large language models (LLMs) are increasingly augmented with long-term memory systems to overcome finite context windows and enable persistent reasoning across interactions. However, recent research finds that LLMs become more vulnerable because memory provides extra attack surfaces. In this paper, we present the first systematic study of black-box adversarial memory injection attacks that target the similarity-based retrieval mechanism in long-term memory-augmented LLMs. We introduce ER-MIA, a unified framework that exposes this vulnerability and formalizes two realistic attack settings: content-based attacks and question-targeted attacks. In these settings, ER-MIA includes an arsenal of composable attack primitives and ensemble attacks that achieve high success rates under minimal attacker assumptions. Extensive experiments across multiple LLMs and long-term memory systems demonstrate that similarity-based retrieval constitutes a fundamental and system-level vulnerability, revealing security risks that persist across memory designs and application scenarios.
ER-MIA:针对长时记忆增强大语言模型的黑盒对抗性记忆注入攻击 / ER-MIA: Black-Box Adversarial Memory Injection Attacks on Long-Term Memory-Augmented Large Language Models
这篇论文首次系统地研究了针对长时记忆增强大语言模型的攻击方法,揭示了其基于相似性检索的核心机制存在根本性安全漏洞,攻击者可以通过向记忆库中注入精心构造的恶意内容来误导模型输出。
源自 arXiv: 2602.15344