菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-23
📄 Abstract - Step-DeepResearch Technical Report

As LLMs shift toward autonomous agents, Deep Research has emerged as a pivotal metric. However, existing academic benchmarks like BrowseComp often fail to meet real-world demands for open-ended research, which requires robust skills in intent recognition, long-horizon decision-making, and cross-source verification. To address this, we introduce Step-DeepResearch, a cost-effective, end-to-end agent. We propose a Data Synthesis Strategy Based on Atomic Capabilities to reinforce planning and report writing, combined with a progressive training path from agentic mid-training to SFT and RL. Enhanced by a Checklist-style Judger, this approach significantly improves robustness. Furthermore, to bridge the evaluation gap in the Chinese domain, we establish ADR-Bench for realistic deep research scenarios. Experimental results show that Step-DeepResearch (32B) scores 61.4% on Scale AI Research Rubrics. On ADR-Bench, it significantly outperforms comparable models and rivals SOTA closed-source models like OpenAI and Gemini DeepResearch. These findings prove that refined training enables medium-sized models to achieve expert-level capabilities at industry-leading cost-efficiency.

顶级标签: llm agents model training
详细标签: research agent data synthesis progressive training chinese benchmark cost-effective ai 或 搜索:

Step-DeepResearch:一种高成本效益的端到端深度研究智能体模型 / Step-DeepResearch Technical Report


1️⃣ 一句话总结

本文提出了Step-DeepResearch,一个通过基于原子能力的数据合成策略和渐进式训练范式构建的、成本效益高的端到端深度研究智能体模型,其在中等参数量(32B)下实现了与顶级闭源模型相当的专家级研究能力,并构建了面向中文现实场景的深度研究基准ADR-Bench。


2️⃣ 论文创新点

1. 基于原子能力的数据合成策略

2. 渐进式训练范式与清单式评判器奖励设计

3. 中文深度研究基准ADR-Bench

4. 端到端框架设计


3️⃣ 主要结果与价值

结果亮点

实际价值


4️⃣ 术语表

源自 arXiv: 2512.20491