具备合理推理能力的AI代理可零样本避免博弈论失效,并具有理论证明 / Reasonably reasoning AI agents can avoid game-theoretic failures in zero-shot, provably
1️⃣ 一句话总结
这项研究证明,未经额外训练、具备基本推理能力的AI代理,在重复互动中能通过观察对手行为并调整自身策略,自然地趋向于稳定的纳什均衡,从而在许多现实博弈场景中无需强制对齐就能实现稳定合作或竞争。
AI agents are increasingly deployed in interactive economic environments characterized by repeated AI-AI interactions. Despite AI agents' advanced capabilities, empirical studies reveal that such interactions often fail to stably induce a strategic equilibrium, such as a Nash equilibrium. Post-training methods have been proposed to induce a strategic equilibrium; however, it remains impractical to uniformly apply an alignment method across diverse, independently developed AI models in strategic settings. In this paper, we provide theoretical and empirical evidence that off-the-shelf reasoning AI agents can achieve Nash-like play zero-shot, without explicit post-training. Specifically, we prove that `reasonably reasoning' agents, i.e., agents capable of forming beliefs about others' strategies from previous observation and learning to best respond to these beliefs, eventually behave along almost every realized play path in a way that is weakly close to a Nash equilibrium of the continuation game. In addition, we relax the common-knowledge payoff assumption by allowing stage payoffs to be unknown and by having each agent observe only its own privately realized stochastic payoffs, and we show that we can still achieve the same on-path Nash convergence guarantee. We then empirically validate the proposed theories by simulating five game scenarios, ranging from a repeated prisoner's dilemma game to stylized repeated marketing promotion games. Our findings suggest that AI agents naturally exhibit such reasoning patterns and therefore attain stable equilibrium behaviors intrinsically, obviating the need for universal alignment procedures in many real-world strategic interactions.
具备合理推理能力的AI代理可零样本避免博弈论失效,并具有理论证明 / Reasonably reasoning AI agents can avoid game-theoretic failures in zero-shot, provably
这项研究证明,未经额外训练、具备基本推理能力的AI代理,在重复互动中能通过观察对手行为并调整自身策略,自然地趋向于稳定的纳什均衡,从而在许多现实博弈场景中无需强制对齐就能实现稳定合作或竞争。
源自 arXiv: 2603.18563