LifeBench: A Benchmark for Long-Horizon Multi-Source Memory

📄 Abstract - LifeBench: A Benchmark for Long-Horizon Multi-Source Memory

Long-term memory is fundamental for personalized agents capable of accumulating knowledge, reasoning over user experiences, and adapting across time. However, existing memory benchmarks primarily target declarative memory, specifically semantic and episodic types, where all information is explicitly presented in dialogues. In contrast, real-world actions are also governed by non-declarative memory, including habitual and procedural types, and need to be inferred from diverse digital traces. To bridge this gap, we introduce Lifebench, which features densely connected, long-horizon event simulation. It pushes AI agents beyond simple recall, requiring the integration of declarative and non-declarative memory reasoning across diverse and temporally extended contexts. Building such a benchmark presents two key challenges: ensuring data quality and scalability. We maintain data quality by employing real-world priors, including anonymized social surveys, map APIs, and holiday-integrated calendars, thus enforcing fidelity, diversity and behavioral rationality within the dataset. Towards scalability, we draw inspiration from cognitive science and structure events according to their partonomic hierarchy; enabling efficient parallel generation while maintaining global coherence. Performance results show that top-tier, state-of-the-art memory systems reach just 55.2\% accuracy, highlighting the inherent difficulty of long-horizon retrieval and multi-source integration within our proposed benchmark. The dataset and data synthesis code are available at this https URL.

LifeBench：一个面向长周期多源记忆的基准测试 / LifeBench: A Benchmark for Long-Horizon Multi-Source Memory

1️⃣ 一句话总结

这篇论文提出了一个名为LifeBench的新基准测试，它通过模拟密集关联的长周期事件来挑战AI智能体，要求其不仅回忆显性知识，还需从多种数字痕迹中推理出习惯性和程序性等非显性记忆，从而更真实地评估智能体在长期、多源记忆整合方面的能力。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要