网络智能体能力的结构化蒸馏实现泛化 / Structured Distillation of Web Agent Capabilities Enables Generalization
1️⃣ 一句话总结
这篇论文提出了一种名为‘智能体即标注员’的结构化框架,它利用前沿大语言模型作为‘老师’自动生成高质量的网络操作轨迹数据,并以此训练一个更小、可本地部署的‘学生’模型,使其在多种网页导航任务上的性能超越了多个知名的闭源大模型,并展现出良好的泛化能力。
Frontier LLMs can navigate complex websites, but their cost and reliance on third-party APIs make local deployment impractical. We introduce Agent-as-Annotators, a framework that structures synthetic trajectory generation for web agents by analogy to human annotation roles, replacing the Task Designer, Annotator, and Supervisor with modular LLM components. Using Gemini 3 Pro as teacher, we generate 3,000 trajectories across six web environments and fine-tune a 9B-parameter student with pure supervised learning on the 2,322 that pass quality filtering. The resulting model achieves 41.5% on WebArena, surpassing closed-source models such as Claude 3.5 Sonnet (36.0%) and GPT-4o (31.5%) under the same evaluation protocol, and nearly doubling the previous best open-weight result (Go-Browse, 21.7%). Capabilities transfer to unseen environments, with an 18.2 percentage point gain on WorkArena L1 (an enterprise platform never seen during training) and consistent improvements across three additional benchmarks. Ablations confirm that each pipeline component contributes meaningfully, with Judge filtering, evaluation hints, and reasoning traces each accounting for measurable gains. These results demonstrate that structured trajectory synthesis from a single frontier teacher is sufficient to produce competitive, locally deployable web agents. Project page: this https URL
网络智能体能力的结构化蒸馏实现泛化 / Structured Distillation of Web Agent Capabilities Enables Generalization
这篇论文提出了一种名为‘智能体即标注员’的结构化框架,它利用前沿大语言模型作为‘老师’自动生成高质量的网络操作轨迹数据,并以此训练一个更小、可本地部署的‘学生’模型,使其在多种网页导航任务上的性能超越了多个知名的闭源大模型,并展现出良好的泛化能力。
源自 arXiv: 2604.07776