Equilibrium Selection in Multi-Agent Policy Gradients via Opponent-Aware Basin Entry

📄 Abstract - Equilibrium Selection in Multi-Agent Policy Gradients via Opponent-Aware Basin Entry

Multi-agent policy-gradient methods have been shown to converge locally near stable Nash equilibria. Local convergence, however, does not determine which equilibrium is reached. We study this question through basin-entry probability with respect to a target set of equilibria selected by an external criterion, such as payoff dominance. For finite-unroll Meta-MAPG, we show that the update decomposes into ordinary policy gradient plus own-learning and peer-learning corrections, with controlled sampling noise and finite-unroll bias. We identify the peer-learning correction as the main equilibrium-selection mechanism: under a local alignment condition, the probability of entering the certified attraction region of the target stable-Nash set increases, relative to ordinary policy gradient. Because persistent correction may shift zero-update points of the original game, annealing the correction after entering the basin recovers ordinary policy-gradient dynamics and inherits local stable-Nash convergence guarantees. Experiments in Stag Hunt, iterated Prisoner's Dilemma, and preliminary neural-policy coordination environments support this basin-entry view, showing increased entry into cooperative basins under peer-aware updates.

多智能体策略梯度中的均衡选择：基于对手感知的盆地区域进入机制 / Equilibrium Selection in Multi-Agent Policy Gradients via Opponent-Aware Basin Entry

1️⃣ 一句话总结

本文提出了一种通过对手感知的更新机制，帮助多智能体系统在策略梯度训练中更大概率进入合作性更优的均衡状态，并在进入后恢复标准算法以保留局部收敛保证。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要