可执行的智能体记忆:面向图形界面智能体的结构化知识图谱 / Executable Agentic Memory for GUI Agent
1️⃣ 一句话总结
本文提出了一种名为“可执行的智能体记忆”的方法,通过构建结构化的知识图谱,将图形界面智能体的操作规划从逐屏思考转变为快速检索和自动执行,从而在长期任务中大幅提升效率、降低成本,并在实际测试中取得了优于现有模型的性能。
Modern GUI agents typically rely on a model-centric and step-wise interaction paradigm, where LLMs must re-interpret the UI and re-decide actions at every screen, which is fragile in long-horizon tasks. In this paper, we propose Executable Agentic Memory (EAM), a structured Knowledge Graph (KG) that shifts GUI planning from free-form generation to a robust retrieval-and-execution process. Our approach includes a sample-efficient memory construction pipeline using state-aware DFS and action-group mining to compress multi-step routines. To ensure efficient planning, we introduce a value-guided graph search where a lightweight Q-function model steers Monte Carlo Tree Search (MCTS) over the KG. We theoretically establish bias-consistency for the Q-model and derive sample complexity bounds for path recovery. Empirically, EAM outperforms state-of-the-art baselines like UI-TARS-7B by up to $19.6\%$ on AndroidWorld, while reducing token costs $6\times$ relative to GPT-4o. With a $2.8$s average latency, EAM enables reliable, quick, and long-horizon GUI automation.
可执行的智能体记忆:面向图形界面智能体的结构化知识图谱 / Executable Agentic Memory for GUI Agent
本文提出了一种名为“可执行的智能体记忆”的方法,通过构建结构化的知识图谱,将图形界面智能体的操作规划从逐屏思考转变为快速检索和自动执行,从而在长期任务中大幅提升效率、降低成本,并在实际测试中取得了优于现有模型的性能。
源自 arXiv: 2605.12294