评估即行动:检索增强智能体的自评估过程奖励 / Evaluate-as-Action: Self-Evaluated Process Rewards for Retrieval-Augmented Agents
1️⃣ 一句话总结
这篇论文提出了一种名为EvalAct的新方法,通过让AI在每一步检索信息后立即进行自我评估打分,并结合一种新的训练算法来优化中间推理步骤,从而显著提升了检索增强智能体在复杂多步问答任务中的准确性和可靠性。
Retrieval-augmented agents can query external evidence, yet their reliability in multi-step reasoning remains limited: noisy retrieval may derail multi-hop question answering, while outcome-only reinforcement learning provides credit signals that are too coarse to optimize intermediate steps. We propose \textsc{EvalAct} (Evaluate-as-Action), which converts implicit retrieval quality assessment into an explicit action and enforces a coupled Search-to-Evaluate protocol so that each retrieval is immediately followed by a structured evaluation score, yielding process signals aligned with the interaction trajectory. To leverage these signals, we introduce Process-Calibrated Advantage Rescaling (PCAR), a GRPO-based optimization method that rescales advantages at the segment level according to evaluation scores, emphasizing reliable segments while updating uncertain ones conservatively. Experiments on seven open-domain QA benchmarks show that \textsc{EvalAct} achieves the best average accuracy, with the largest gains on multi-hop tasks, and ablations verify that the explicit evaluation loop drives the primary improvements while PCAR provides consistent additional benefits.
评估即行动:检索增强智能体的自评估过程奖励 / Evaluate-as-Action: Self-Evaluated Process Rewards for Retrieval-Augmented Agents
这篇论文提出了一种名为EvalAct的新方法,通过让AI在每一步检索信息后立即进行自我评估打分,并结合一种新的训练算法来优化中间推理步骤,从而显著提升了检索增强智能体在复杂多步问答任务中的准确性和可靠性。
源自 arXiv: 2603.09203