菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-27
📄 Abstract - MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems

Memory is essential for enabling large language models to support long-horizon reasoning, yet existing memory systems remain unreliable and difficult to debug. Tracing memory's dynamic evolution is crucial to understand how information is synthesized, propagated, or corrupted over time. In this work, we study the new problem of error tracing and attribution in LLM memory systems. We propose a novel framework that transforms memory pipelines into executable memory evolution graphs, enabling fine-grained tracing of operational information flow. We then construct MemTraceBench, a benchmark collected from representative memory systems such as Long-Context, RAG, Mem0, and EverMemOS, to systematically study memory failure modes. We further introduce an automatic attribution method that iteratively traces operation subgraphs to pinpoint the root cause of any failed case. Our analysis reveals that memory failures are systematic, stemming from operation-level issues like information loss and retrieval misalignment. Crucially, we leverage these fine-grained attribution signals to guide downstream prompt optimization, establishing a closed-loop system that automatically corrects faults and boosts end-task performance by up to 7.62%. Code will be released at this https URL.

顶级标签: llm benchmark model evaluation
详细标签: memory systems error attribution information flow benchmark debugging 或 搜索:

MemTrace:大语言模型记忆系统中的错误追踪与归因 / MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems


1️⃣ 一句话总结

本文提出了一种新框架,能将大语言模型的记忆处理流程转化为可执行图,从而像“侦探”一样自动追踪和定位记忆出错的根本原因,并通过修复这些错误让模型在长时推理任务中性能提升最高7.62%。

源自 arXiv: 2605.28732