菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-28
📄 Abstract - Entity-Collision: A Stratified Protocol for Attributing Retrieval Lift in Agent Memory

End-to-end agent-memory benchmarks report a single hit@k per retriever, confounding lexical leakage (uncontrolled query/gold/distractor entity overlap) with tag-mixing (preferences, services, tools averaged together). We propose entity-collision, a system-agnostic protocol that pins the BM25 floor by construction -- every distractor shares the answer's entity tokens -- and stratifies queries by discriminator tag, so any lift over BM25 is attributable to the embedder. Applied to an open-source agent-memory testbed across 5 tags x 3 embedders x 5 collision degrees with paired-bootstrap 95% CIs, the protocol reveals a two-axis pattern: a 256-d hash trigram helps only on closed-vocabulary lexical tags at deep collision; MiniLM-384 dominates both axes; and a 2.7x-parameter BGE-large does not uniformly improve on MiniLM -- it wins on intent-style queries but loses on lexical ones. Encoder capacity alone is not the binding constraint. The synthetic intent-tag null replicates on LongMemEval (n=500) as a single-session-preference recall cliff. Adaptive vector-weight routing on LoCoMo is a measured null: 11.7pp of oracle headroom exists, but no signal we tested recovers it. All 26 result tables and 37 reproduce scripts are version-controlled and verified by a public registry; the protocol is exercised on a deterministically governed memory testbed (event-sourced decision log, DAG-state-machine schema lifecycle) so every reported CI is reproducible byte-for-byte from the ingest stream.

顶级标签: agents system model evaluation
详细标签: agent memory retrieval evaluation benchmark protocol embedding comparison reproducibility 或 搜索:

实体碰撞:一种用于归因智能体记忆系统中检索增益的分层协议 / Entity-Collision: A Stratified Protocol for Attributing Retrieval Lift in Agent Memory


1️⃣ 一句话总结

本文提出了一种名为“实体碰撞”的标准化测试协议,通过让所有干扰项包含与答案相同的实体词汇,并按照查询类型分层,从而可靠地衡量不同嵌入模型相对于传统BM25检索器的真实性能提升,实验发现模型参数量大小并非决定性因素,且当前模型在复杂意图查询与简单词汇查询上表现各异。

源自 arXiv: 2605.29630