菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-02
📄 Abstract - According to Me: Long-Term Personalized Referential Memory QA

Personalized AI assistants must recall and reason over long-term user memory, which naturally spans multiple modalities and sources such as images, videos, and emails. However, existing Long-term Memory benchmarks focus primarily on dialogue history, failing to capture realistic personalized references grounded in lived experience. We introduce ATM-Bench, the first benchmark for multimodal, multi-source personalized referential Memory QA. ATM-Bench contains approximately four years of privacy-preserving personal memory data and human-annotated question-answer pairs with ground-truth memory evidence, including queries that require resolving personal references, multi-evidence reasoning from multi-source and handling conflicting evidence. We propose Schema-Guided Memory (SGM) to structurally represent memory items originated from different sources. In experiments, we implement 5 state-of-the-art memory systems along with a standard RAG baseline and evaluate variants with different memory ingestion, retrieval, and answer generation techniques. We find poor performance (under 20\% accuracy) on the ATM-Bench-Hard set, and that SGM improves performance over Descriptive Memory commonly adopted in prior works. Code available at: this https URL

顶级标签: llm agents benchmark
详细标签: personalized memory multimodal reasoning referential qa long-term memory retrieval-augmented generation 或 搜索:

关于我:长期个性化指代记忆问答 / According to Me: Long-Term Personalized Referential Memory QA


1️⃣ 一句话总结

这篇论文提出了首个多模态、多来源的个性化长期记忆问答基准ATM-Bench,并设计了一种结构化记忆表示方法,以帮助AI助手更好地理解和回答基于用户个人生活经历的问题。

源自 arXiv: 2603.01990