ClawGym:构建高效个人数字助理的可扩展框架 / ClawGym: A Scalable Framework for Building Effective Claw Agents
1️⃣ 一句话总结
本文提出ClawGym框架,通过自动生成大规模、可验证的训练数据(13.5K任务),并利用监督微调和轻量级强化学习训练AI代理,同时构建了200个测试样本的基准,从而系统性地解决了开发个人数字助理(能操作本地文件、工具和持久工作空间)时缺乏标准化流程和评估方法的难题。
Claw-style environments support multi-step workflows over local files, tools, and persistent workspace states. However, scalable development around these environments remains constrained by the absence of a systematic framework, especially one for synthesizing verifiable training data and integrating it with agent training and diagnostic evaluation. To address this challenge, we present ClawGym, a scalable framework that supports the full lifecycle of Claw-style personal agent development. Concretely, we construct ClawGym-SynData, a diverse dataset of 13.5K filtered tasks synthesized from persona-driven intents and skill-grounded operations, paired with realistic mock workspaces and hybrid verification mechanisms. We then train a family of capable Claw-style models, termed ClawGym-Agents, through supervised fine-tuning on black-box rollout trajectories, and further explore reinforcement learning via a lightweight pipeline that parallelizes rollouts across per-task this http URL support reliable evaluation, we further construct ClawGym-Bench, a benchmark of 200 instances calibrated through automated filtering and human-LLM review. Relevant resources will be soon released at this https URL.
ClawGym:构建高效个人数字助理的可扩展框架 / ClawGym: A Scalable Framework for Building Effective Claw Agents
本文提出ClawGym框架,通过自动生成大规模、可验证的训练数据(13.5K任务),并利用监督微调和轻量级强化学习训练AI代理,同时构建了200个测试样本的基准,从而系统性地解决了开发个人数字助理(能操作本地文件、工具和持久工作空间)时缺乏标准化流程和评估方法的难题。
源自 arXiv: 2604.26904