LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent

📄 Abstract - LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent

Reinforcement Learning (RL) has emerged as a powerful training paradigm for LLM-based agents. However, scaling agentic RL for deep research remains constrained by two coupled challenges: hand-crafted synthetic data fails to elicit genuine real-world search capabilities, and real-world search dependency during RL training introduces instability and prohibitive cost, which limits the scalability of Agentic RL. LiteResearcher is a training framework that makes Agentic RL scalable: by constructing a lite virtual world that mirrors real-world search dynamics, we enable a continuously improving training recipe that empowers a tiny search agent to outperform large-scale open-source and commercial models (e.g., Tongyi DeepResearch and Claude-4.5 Sonnet). Specifically, on common benchmarks such as GAIA and Xbench, our LiteResearcher-4B achieves open-source state-of-the-art results of 71.3% and 78.0% respectively, demonstrating that scalable RL training is a key enabler for Deep Research Agents.

轻量研究者：面向深度研究智能体的可扩展强化学习训练框架 / LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent

1️⃣ 一句话总结

本文提出LiteResearcher框架，通过构建一个模拟真实搜索环境的轻量虚拟世界，解决了强化学习训练深度研究智能体时数据不真实、成本高和不稳定的问题，使得仅4B参数的模型在多个基准上超越了大型开源和商业模型。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要