用于推理的协作式多智能体测试时强化学习 / Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning
1️⃣ 一句话总结
这篇论文提出了一种名为MATTRL的新方法,它让多个AI专家在测试时通过讨论和分享经验来共同解决问题,从而显著提升了在医疗、数学等复杂任务上的推理准确率,且无需进行耗时的模型训练。
Multi-agent systems have evolved into practical LLM-driven collaborators for many applications, gaining robustness from diversity and cross-checking. However, multi-agent RL (MARL) training is resource-intensive and unstable: co-adapting teammates induce non-stationarity, and rewards are often sparse and high-variance. Therefore, we introduce \textbf{Multi-Agent Test-Time Reinforcement Learning (MATTRL)}, a framework that injects structured textual experience into multi-agent deliberation at inference time. MATTRL forms a multi-expert team of specialists for multi-turn discussions, retrieves and integrates test-time experiences, and reaches consensus for final decision-making. We also study credit assignment for constructing a turn-level experience pool, then reinjecting it into the dialogue. Across challenging benchmarks in medicine, math, and education, MATTRL improves accuracy by an average of 3.67\% over a multi-agent baseline, and by 8.67\% over comparable single-agent baselines. Ablation studies examine different credit-assignment schemes and provide a detailed comparison of how they affect training outcomes. MATTRL offers a stable, effective and efficient path to distribution-shift-robust multi-agent reasoning without tuning.
用于推理的协作式多智能体测试时强化学习 / Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning
这篇论文提出了一种名为MATTRL的新方法,它让多个AI专家在测试时通过讨论和分享经验来共同解决问题,从而显著提升了在医疗、数学等复杂任务上的推理准确率,且无需进行耗时的模型训练。
源自 arXiv: 2601.09667