菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-18
📄 Abstract - Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents

Safety evaluations of memory-equipped LLM agents typically measure within-task safety: whether an agent completes a single scenario safely, often under adversarial conditions such as prompt injection or memory poisoning. In deployment, however, a single agent serves many independent tasks over a long horizon, and memory accumulated during earlier tasks can affect behavior on later, unrelated ones. Studying this regime requires evaluation along the temporal dimension across tasks: not whether an agent is safe at any single memory state, but how its safety profile changes as memory accumulates across many independent interactions. We call this failure mode temporal memory contamination. To isolate memory exposure from stream non-stationarity, we introduce a trigger-probe protocol that evaluates a fixed probe set against read-only memory snapshots at varying prefix lengths, together with a NullMemory counterfactual baseline for identifying memory-induced violations. We apply this protocol across three deployment scenarios spanning records, memos, forms, and email correspondence and eight memory architectures, and additionally on Claw-like AI agents, such as OpenClaw, using the platform's native memory mechanism. Memory-enabled agents consistently exceed the NullMemory baseline, and memory-induced violation rates show a robust upward trend with exposure length on both agent classes. Order-randomization experiments indicate that the effect is driven primarily by accumulated content rather than encounter order. Finally, a structural consequence of the event decomposition is that memory-induced risk is detectable from retrieval state before generation, which we confirm with a high-recall diagnostic monitor. Our results argue for treating memory safety as a longitudinal property that requires temporal evaluation, not a single-state property that can be captured by a snapshot.

顶级标签: agents llm model evaluation
详细标签: memory-equipped agents safety evaluation temporal contamination longitudinal risk trigger-probe protocol 或 搜索:

记忆越多,风险越大:配备记忆功能的LLM智能体的长期安全风险 / Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents


1️⃣ 一句话总结

这项研究揭示了配备记忆功能的AI智能体在执行大量独立任务时,随着记忆的不断积累,其安全性会逐渐下降,更容易产生违规行为,因此不能仅在单次任务中评估安全性,而需要像对待慢性病一样,对其长期记忆演化过程进行持续监测。

源自 arXiv: 2605.17830