KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

📄 Abstract - KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

Existing long-horizon memory benchmarks mostly use multi-turn dialogues or synthetic user histories, which makes retrieval performance an imperfect proxy for person understanding. We present \BenchName, a publicly releasable benchmark built from long-form autobiographical narratives, where actions, context, and inner thoughts provide dense evidence for inferring stable motivations and decision principles. \BenchName~reconstructs each narrative into a flashback-aware, time-anchored stream and evaluates models with evidence-linked questions spanning factual recall, subjective state attribution, and principle-level reasoning. Across diverse narrative sources, retrieval-augmented systems mainly improve factual accuracy, while errors persist on temporally grounded explanations and higher-level inferences, highlighting the need for memory mechanisms beyond retrieval. Our data is in \href{KnowMeBench}{this https URL}.

KnowMe-Bench：面向终身数字伴侣的人物理解基准测试 / KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

1️⃣ 一句话总结

这篇论文提出了一个名为KnowMe-Bench的新基准测试，它使用真实的长篇自传体叙事来评估AI模型对人的深层次理解能力，发现当前基于检索的系统主要提升了事实记忆，但在解释时间关联和进行高级推理方面仍有不足，揭示了未来数字伴侣需要更先进的记忆机制。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要