MARTI-MARS²:通过强化学习实现代码生成的多智能体自我搜索扩展 / MARTI-MARS$^2$: Scaling Multi-Agent Self-Search via Reinforcement Learning for Code Generation
1️⃣ 一句话总结
这篇论文提出了一种名为MARTI-MARS²的新框架,它通过强化学习让多个AI智能体像团队一样协作、互相学习和纠正错误,从而显著提升了复杂代码生成任务的性能,并发现智能体之间的策略多样性是提升整体能力的关键。
While the complex reasoning capability of Large Language Models (LLMs) has attracted significant attention, single-agent systems often encounter inherent performance ceilings in complex tasks such as code generation. Multi-agent collaboration offers a promising avenue to transcend these boundaries. However, existing frameworks typically rely on prompt-based test-time interactions or multi-role configurations trained with homogeneous parameters, limiting error correction capabilities and strategic diversity. In this paper, we propose a Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2), which integrates policy learning with multi-agent tree search by formulating the multi-agent collaborative exploration process as a dynamic and learnable environment. By allowing agents to iteratively explore and refine within the environment, the framework facilitates evolution from parameter-sharing homogeneous multi-role training to heterogeneous multi-agent training, breaking through single-agent capability limits. We also introduce an efficient inference strategy MARTI-MARS2-T+ to fully exploit the scaling potential of multi-agent collaboration at test time. We conduct extensive experiments across varied model scales (8B, 14B, and 32B) on challenging code generation benchmarks. Utilizing two collaborating 32B models, MARTI-MARS2 achieves 77.7%, outperforming strong baselines like GPT-5.1. Furthermore, MARTI-MARS2 reveals a novel scaling law: shifting from single-agent to homogeneous multi-role and ultimately to heterogeneous multi-agent paradigms progressively yields higher RL performance ceilings, robust TTS capabilities, and greater policy diversity, suggesting that policy diversity is critical for scaling intelligence via multi-agent reinforcement learning.
MARTI-MARS²:通过强化学习实现代码生成的多智能体自我搜索扩展 / MARTI-MARS$^2$: Scaling Multi-Agent Self-Search via Reinforcement Learning for Code Generation
这篇论文提出了一种名为MARTI-MARS²的新框架,它通过强化学习让多个AI智能体像团队一样协作、互相学习和纠正错误,从而显著提升了复杂代码生成任务的性能,并发现智能体之间的策略多样性是提升整体能力的关键。
源自 arXiv: 2602.07848