MEMPROBE: Probing Long-Term Agent Memory via Hidden User-State Recovery

📄 Abstract - MEMPROBE: Probing Long-Term Agent Memory via Hidden User-State Recovery

Long-term memory promises LLM agents that grow more capable across sessions, maintaining an accurate, evolving understanding of the user that interaction forms. In practice, however, this memory is evaluated mostly through downstream behavior, such as later answers, personalization quality, or task success, which tests that understanding only indirectly and leaves the memory artifact itself largely unaudited. We argue that long-term memory should instead be evaluated as an auditable post-interaction artifact: after ordinary assistance, what structured user state can be reconstructed from the memory the agent leaves behind? We instantiate this view in MEMPROBE, a benchmark in which a memory-equipped agent assists simulated users, each carrying a hidden, taxonomy-anchored user-state bank, across a trajectory of leak-controlled tasks, after which that bank is reconstructed from the agent's resulting memory under both full-store and top-k access. Built on synthetic ground truth for efficient, scalable measurement, MEMPROBE spans 50 simulated users with 31 hidden dimensions each (1,550 recovery targets) and tests 5 representative memory systems. Testing state-of-the-art memory agents, we find that successful assistance and recoverable memory behave as distinct capabilities. Task completion nearly saturates, even for a memoryless baseline, while category-balanced recovery stays moderate (about 0.6) and drops further under top-k retrieval. MEMPROBE is the first benchmark to study memory recovery directly, reconstructing the user state a system retains and scoring it against ground truth. We see recovery as a concrete objective for future memory agents to optimize, and MEMPROBE as a step toward an environment where agents are trained to remember their users, growing more faithful the longer they know them.

MEMPROBE：通过隐藏用户状态恢复探测长期智能体记忆 / MEMPROBE: Probing Long-Term Agent Memory via Hidden User-State Recovery

1️⃣ 一句话总结

本文提出一种名为MEMPROBE的全新评估方法，不再仅通过任务完成率等间接指标衡量AI助手的长期记忆能力，而是直接检查助手使用后留下的记忆痕迹能否准确还原用户的隐藏信息（如偏好、身份等），并发现现有记忆系统在精确恢复用户状态方面表现有限，与任务表现之间存在显著差距。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要