菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-18
📄 Abstract - EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

Equipping LLMs with tool-use capabilities via Agentic Reinforcement Learning (Agentic RL) is bottlenecked by two challenges: the lack of scalable, robust execution environments and the scarcity of realistic training data that captures implicit human reasoning. Existing approaches depend on costly real-world APIs, hallucination-prone LLM simulators, or synthetic environments that are often single-turn or depend on pre-collected documents. Moreover, synthetic trajectories are frequently over-specified, resembling instruction sequences rather than natural human intents, reducing their effectiveness for RL training. We introduce EnvFactory, a fully automated framework that addresses both challenges. EnvFactory autonomously explores and verifies stateful, executable tool environments from authentic resources, and synthesizes natural multi-turn trajectories through topology-aware sampling and calibrated refinement, producing grounded queries with implicit intents. Using only 85 verified environments across 7 domains, EnvFactory generates 2,575 SFT and RL trajectories. Despite using significantly fewer environments than prior work, which are often 5 times more, EnvFactory achieves superior training efficiency and downstream performance, improving Qwen3-series models by up to +15% on BFCLv3, +8.6% on MCP-Atlas, and +6% on conversational benchmarks including $\tau^2$-Bench and VitaBench. By fully automating both environment construction and trajectory synthesis, EnvFactory provides a scalable, extensible, and robust foundation for Agentic RL.

顶级标签: agents reinforcement learning llm
详细标签: tool use environment synthesis agentic rl trajectory generation benchmark 或 搜索:

EnvFactory:通过可执行环境合成与稳健强化学习规模化工具使用智能体 / EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL


1️⃣ 一句话总结

本文提出EnvFactory,一个全自动框架,能够从真实资源中自主构建可执行的工具环境,并合成自然的、含隐式意图的多轮对话轨迹,从而在无需昂贵人工标注或易出错的模拟器的情况下,高效训练具备工具使用能力的强化学习智能体,在多个基准测试上显著提升模型性能。

源自 arXiv: 2605.18703