Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX

📄 Abstract - Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX

Riichi Mahjong is a multi-player, imperfect-information game characterized by stochasticity and high-dimensional state spaces. These attributes present a unique combination of challenges that mirror complex real-world decision-making problems in reinforcement learning. While prior research has heavily relied on supervised learning from human play logs to pre-train the policy, algorithms capable of learning \textit{tabula rasa} (from scratch) offer greater potential for general applicability, as evidenced by the AlphaZero lineage. To facilitate such research, we introduce \textbf{Mahjax}, a fully vectorized Riichi Mahjong environment implemented in JAX to enable large-scale rollout parallelization on Graphics Processing Units (GPUs). We also provide a high-quality visualization tool to streamline debugging and interaction with trained agents. Experimental results demonstrate that Mahjax achieves throughputs of up to \textbf{2 million} and \textbf{1 million steps per second} on eight NVIDIA A100 GPUs under the no-red and red rules, respectively. Furthermore, we validate the environment's utility for reinforcement learning by showing that agents can be trained effectively to improve their rank against baseline policies.

Mahjax：基于JAX的GPU加速麻将模拟器，用于强化学习研究 / Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX

1️⃣ 一句话总结

本文提出了Mahjax，一个基于JAX框架、完全向量化的日本麻将模拟环境，能够利用GPU大规模并行运行游戏，每秒处理高达200万步，从而让强化学习算法无需依赖人类数据即可从零开始训练智能体，并为研究者提供了直观的可视化调试工具。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要