When Does Memory Help Multi-Trajectory Inference for Tool-Use LLM Agents?

📄 Abstract - When Does Memory Help Multi-Trajectory Inference for Tool-Use LLM Agents?

Multi-trajectory inference for tool-use LLM agents - generating multiple reasoning attempts and selecting among them - benefits from transferring knowledge across attempts so that later ones avoid the pitfalls of earlier ones. Existing cross-trajectory memory methods (trajectory-level reflection, atomic fact extraction, raw observation injection) are each evaluated under a single inference strategy on a single task, making it unclear whether reported gains reflect properties of the memory abstraction or of the inference method. We propose a unified framework that decomposes memory along two axes -- the scope of transfer (within an expansion vs. across trajectories) and the abstraction of the transferred content -- and evaluate four methods under three inference strategies (best-of-N, beam search, MCTS) on four tool-use benchmarks spanning SQL, knowledge-graph, and CLI environments, in a verifier-free setting that matches the deployment regime of practical agents. The experiment matrix identifies the inference method as a confound: the same memory method produces statistically distinct results under different inference strategies on the same examples. Reflection reaches significance only under MCTS (not under best-of-N); within-expansion injection (conditioning each candidate on prior siblings' outcomes) helps only diversity-starved beam search; and atomic fact extraction is accuracy-neutral but shortens trajectories by 19-26% on tasks with reusable environmental structure.

记忆何时帮助工具型大语言模型代理的多轨迹推理？ / When Does Memory Help Multi-Trajectory Inference for Tool-Use LLM Agents?

1️⃣ 一句话总结

本文通过统一框架系统分析了不同记忆方法（如反思、事实提取等）在不同推理策略（如最佳N选、束搜索、蒙特卡洛树搜索）下对工具型AI代理多轨迹推理效果的影响，发现推理策略本身会显著干扰记忆方法的实际表现，并指出反思仅在蒙特卡洛树搜索下有效，而事实提取虽不提升准确率但可缩短任务轨迹。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要