📄
Abstract - Cooperative Profiles Predict Multi-Agent LLM Team Performance in AI for Science Workflows
Multi-agent systems built from teams of large language models (LLMs) are increasingly deployed for collaborative scientific reasoning and problem-solving. These systems require agents to coordinate under shared constraints, such as GPUs or credit balances, where cooperative behavior matters. Behavioral economics provides a rich toolkit of games that isolate distinct cooperation mechanisms, yet it remains unknown whether a model's behavior in these stylized settings predicts its performance in realistic collaborative tasks. Here, we benchmark 35 open-weight LLMs across six behavioral economics games and show that game-derived cooperative profiles robustly predict downstream performance in AI-for-Science tasks, where teams of LLM agents collaboratively analyze data, build models, and produce scientific reports under shared budget constraints. Models that effectively coordinate games and invest in multiplicative team production (rather than greedy strategies) produce better scientific reports across three outcomes, accuracy, quality, and completion. These associations hold after controlling for multiple factors, indicating that cooperative disposition is a distinct, measurable property of LLMs not reducible to general ability. Our behavioral games framework thus offers a fast and inexpensive diagnostic for screening cooperative fitness before costly multi-agent deployment.
合作特征预测多智能体大语言模型团队在科学工作流中的表现 /
Cooperative Profiles Predict Multi-Agent LLM Team Performance in AI for Science Workflows
1️⃣ 一句话总结
本文通过让35种开源大语言模型参与六个行为经济学游戏,发现它们在游戏中的合作特征能有效预测这些模型在科学协作任务中的团队表现,即更善于合作、注重团队产出的模型能产出更准确、高质量和完整的科学报告,这为快速筛选适合团队协作的AI模型提供了低成本方法。