SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans

📄 Abstract - SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans

Research Agents enable models to gather information from the web using tools to answer user queries, requiring them to dynamically interleave internal reasoning with tool use. While such capabilities can in principle be learned via reinforcement learning with verifiable rewards (RLVR), we observe that agents often exhibit poor exploration behaviors, including premature termination and biased tool usage. As a result, RLVR alone yields limited improvements. We propose SynPlanResearch-R1, a framework that synthesizes tool-use trajectories that encourage deeper exploration to shape exploration during cold-start supervised fine-tuning, providing a strong initialization for subsequent RL. Across seven multi-hop and open-web benchmarks, \framework improves performance by up to 6.0% on Qwen3-8B and 5.8% on Qwen3-4B backbones respectively compared to SOTA baselines. Further analyses of tool-use patterns and training dynamics compared to baselines shed light on the factors underlying these gains. Our code is publicly available at this https URL.

SynPlanResearch-R1：通过合成规划鼓励深度研究中的工具探索 / SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans

1️⃣ 一句话总结

这篇论文提出了一个名为SynPlanResearch-R1的框架，它通过合成工具使用轨迹来引导研究型AI代理进行更深入、更全面的探索，从而显著提升了其在多项复杂网络搜索任务中的表现。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要