Towards Self-Improving Error Diagnosis in Multi-Agent Systems

📄 Abstract - Towards Self-Improving Error Diagnosis in Multi-Agent Systems

Large Language Model (LLM)-based Multi-Agent Systems (MAS) enable complex problem-solving but introduce significant debugging challenges, characterized by long interaction traces, inter-agent dependencies, and delayed error manifestation. Existing diagnostic approaches often rely on expensive expert annotation or ''LLM-as-a-judge'' paradigms, which struggle to pinpoint decisive error steps within extended contexts. In this paper, we introduce ErrorProbe, a self-improving framework for semantic failure attribution that identifies responsible agents and the originating error step. The framework operates via a three-stage pipeline: (1) operationalizing the MAS failure taxonomy to detect local anomalies, (2) performing symptom-driven backward tracing to prune irrelevant context, and (3) employing a specialized multi-agent team (Strategist, Investigator, Arbiter) to validate error hypotheses through tool-grounded execution. Crucially, ErrorProbe maintains a verified episodic memory that updates only when error patterns are confirmed by executable evidence, without the need for annotation. Experiments across the TracerTraj and Who&When benchmarks demonstrate that ErrorProbe significantly outperforms baselines, particularly in step-level localization, while the verified memory enables robust cross-domain transfer without retraining.

迈向多智能体系统中自我改进的错误诊断 / Towards Self-Improving Error Diagnosis in Multi-Agent Systems

1️⃣ 一句话总结

本文提出了一种名为ErrorProbe的自我改进框架，能够自动定位多智能体系统中的错误步骤和责任智能体，通过三阶段流程（异常检测、回溯剪枝和工具验证）实现精准诊断，并利用可验证的记忆机制持续提升性能，无需人工标注。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要