菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-22
📄 Abstract - GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

Training capable Large Language Model (LLM) agents is critically bottlenecked by the high cost and static nature of real-world interaction data. We address this by introducing GenEnv, a framework that establishes a difficulty-aligned co-evolutionary game between an agent and a scalable, generative environment simulator. Unlike traditional methods that evolve models on static datasets, GenEnv instantiates a dataevolving: the simulator acts as a dynamic curriculum policy, continuously generating tasks specifically tailored to the agent's ``zone of proximal development''. This process is guided by a simple but effective $\alpha$-Curriculum Reward, which aligns task difficulty with the agent's current capabilities. We evaluate GenEnv on five benchmarks, including API-Bank, ALFWorld, BFCL, Bamboogle, and TravelPlanner. Across these tasks, GenEnv improves agent performance by up to \textbf{+40.3\%} over 7B baselines and matches or exceeds the average performance of larger models. Compared to Gemini 2.5 Pro-based offline data augmentation, GenEnv achieves better performance while using 3.3$\times$ less data. By shifting from static supervision to adaptive simulation, GenEnv provides a data-efficient pathway for scaling agent capabilities.

顶级标签: llm agents model training
详细标签: co-evolution training framework difficulty alignment data generation adaptive curriculum 或 搜索:

GenEnv:一种基于难度对齐协同进化的LLM智能体训练框架 / GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators


1️⃣ 一句话总结

本文提出了一种名为GenEnv的新型训练框架,通过将智能体与一个可扩展的环境模拟器置于一个难度对齐的协同进化游戏中,动态生成与智能体当前能力匹配的训练数据,从而高效、低成本地提升智能体在复杂开放环境中的任务解决能力。


2️⃣ 论文创新点

1. 数据演化范式

2. 难度对齐协同进化

3. 环境难度对齐奖励机制

4. 动态演化的双数据池训练

5. α-Curriculum奖励的理论分析

6. 奖励排序一致性的理论保证


3️⃣ 主要结果与价值

结果亮点

实际价值


4️⃣ 术语表

源自 arXiv: 2512.19682