CooperBench:为何编码智能体目前尚不能成为你的队友 / CooperBench: Why Coding Agents Cannot be Your Teammates Yet
1️⃣ 一句话总结
这篇论文通过一个名为CooperBench的协作编程测试集发现,当前最先进的AI编码助手在需要团队协作的任务中表现不佳,成功率比单独工作时平均低30%,主要因为它们缺乏有效沟通、信守承诺和协调计划的社会智能。
Resolving team conflicts requires not only task-specific competence, but also social intelligence to find common ground and build consensus. As AI agents increasingly collaborate on complex work, they must develop coordination capabilities to function as effective teammates. Yet we hypothesize that current agents lack these capabilities. To test this, we introduce CooperBench, a benchmark of over 600 collaborative coding tasks across 12 libraries in 4 programming languages. Each task assigns two agents different features that can be implemented independently but may conflict without proper coordination. Tasks are grounded in real open-source repositories with expert-written tests. Evaluating state-of-the-art coding agents, we observe the curse of coordination: agents achieve on average 30% lower success rates when working together compared to performing both tasks individually. This contrasts sharply with human teams, where adding teammates typically improves productivity. Our analysis reveals three key issues: (1) communication channels become jammed with vague, ill-timed, and inaccurate messages; (2) even with effective communication, agents deviate from their commitments; and (3) agents often hold incorrect expectations about others' plans and communication. Through large-scale simulation, we also observe rare but interesting emergent coordination behavior including role division, resource division, and negotiation. Our research presents a novel benchmark for collaborative coding and calls for a shift from pursuing individual agent capability to developing social intelligence.
CooperBench:为何编码智能体目前尚不能成为你的队友 / CooperBench: Why Coding Agents Cannot be Your Teammates Yet
这篇论文通过一个名为CooperBench的协作编程测试集发现,当前最先进的AI编码助手在需要团队协作的任务中表现不佳,成功率比单独工作时平均低30%,主要因为它们缺乏有效沟通、信守承诺和协调计划的社会智能。
源自 arXiv: 2601.13295