Search-R2:通过执行者-精炼者协作增强搜索集成推理 / Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration
1️⃣ 一句话总结
这篇论文提出了一个名为Search-R2的新框架,它通过让‘执行者’生成初步推理步骤、‘元精炼者’诊断并修复错误步骤的协作方式,结合精细化的奖励设计,有效解决了智能体在搜索式推理中因奖励稀疏导致的低效问题,从而在多种问答任务上取得了更高的准确率。
Search-integrated reasoning enables language agents to transcend static parametric knowledge by actively querying external sources. However, training these agents via reinforcement learning is hindered by the multi-scale credit assignment problem: existing methods typically rely on sparse, trajectory-level rewards that fail to distinguish between high-quality reasoning and fortuitous guesses, leading to redundant or misleading search behaviors. To address this, we propose Search-R2, a novel Actor-Refiner collaboration framework that enhances reasoning through targeted intervention, with both components jointly optimized during training. Our approach decomposes the generation process into an Actor, which produces initial reasoning trajectories, and a Meta-Refiner, which selectively diagnoses and repairs flawed steps via a 'cut-and-regenerate' mechanism. To provide fine-grained supervision, we introduce a hybrid reward design that couples outcome correctness with a dense process reward quantifying the information density of retrieved evidence. Theoretically, we formalize the Actor-Refiner interaction as a smoothed mixture policy, proving that selective correction yields strict performance gains over strong baselines. Extensive experiments across various general and multi-hop QA datasets demonstrate that Search-R2 consistently outperforms strong RAG and RL-based baselines across model scales, achieving superior reasoning accuracy with minimal overhead.
Search-R2:通过执行者-精炼者协作增强搜索集成推理 / Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration
这篇论文提出了一个名为Search-R2的新框架,它通过让‘执行者’生成初步推理步骤、‘元精炼者’诊断并修复错误步骤的协作方式,结合精细化的奖励设计,有效解决了智能体在搜索式推理中因奖励稀疏导致的低效问题,从而在多种问答任务上取得了更高的准确率。
源自 arXiv: 2602.03647