菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-13
📄 Abstract - Offline Two-Player Zero-Sum Markov Games with KL Regularization

We study the problem of learning Nash equilibria in offline two-player zero-sum Markov games. While existing approaches often rely on explicit pessimism to address distribution shift, we show that KL regularization alone suffices to stabilize learning and guarantee convergence. We first introduce Regularized Offline Sequential Equilibrium (ROSE), a theoretical framework that achieves a fast $\widetilde{\mathcal{O}}(1/n)$ convergence rate under \textit{unilateral concentrability}, improving over the standard $\widetilde{\mathcal{O}}(1/\sqrt{n})$ rates in unregularized settings. We then propose Sequential Offline Self-play Mirror Descent (SOS-MD), a practical model-free algorithm based on least-squares value estimation and iterative self-play updates. We prove that the last iterate of SOS-MD attains the same $\widetilde{\mathcal{O}}(1/n)$ statistical rate up to a vanishing optimization error of order $\widetilde{\mathcal{O}}(1/\sqrt{T})$ in the number of self-play iterations $T$.

顶级标签: reinforcement learning machine learning theory
详细标签: markov games nash equilibria kl regularization offline learning convergence rates 或 搜索:

带KL正则化的离线双人零和马尔可夫博弈 / Offline Two-Player Zero-Sum Markov Games with KL Regularization


1️⃣ 一句话总结

本文证明,在离线双人零和博弈中,仅使用KL散度正则化就能有效避免数据分布偏移带来的不稳定问题,并提出了两种新方法(ROSE和SOS-MD),将学习纳什均衡的收敛速度从传统方法的平方根级别提升到线性级别。

源自 arXiv: 2605.13025