Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation

📄 Abstract - Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation

Nowadays, training and evaluating DeepResearch-generated reports remain challenging due to the lack of verifiable reward signals. Accordingly, rubric-based evaluation has become a common practice. However, existing approaches either rely on coarse, pre-defined rubrics that lack sufficient granularity, or depend on manually constructed query-specific rubrics that are costly and difficult to scale. In this paper, we propose a pipeline to train human-preference-aligned query-specific rubric generators tailored for DeepResearch report generation. We first construct a dataset of DeepResearch-style queries annotated with human preferences over paired reports, and train rubric generators via reinforcement learning with a hybrid reward combining human preference supervision and LLM-based rubric evaluation. To better handle long-horizon reasoning, we further introduce a Multi-agent Markov-state (MaMs) workflow for report generation. We empirically show that our proposed rubric generators deliver more discriminative and better human-aligned supervision than existing rubric design strategies. Moreover, when integrated into the MaMs training framework, DeepResearch systems equipped with our rubric generators consistently outperform all open-source baselines on the DeepResearch Bench and achieve performance comparable to that of leading closed-source models.

从人类偏好中学习特定查询的评分标准以用于深度研究报告生成 / Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation

1️⃣ 一句话总结

这篇论文提出了一种新方法，通过结合人类偏好和强化学习，自动生成针对具体查询的精细评分标准，从而更有效地训练和评估AI生成的深度研究报告，使其性能接近顶尖的闭源模型。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要