再探:从强化学习视角重新审视神经量子态 / One More Time: Revisiting Neural Quantum States from a Reinforcement Learning Perspective
1️⃣ 一句话总结
该论文将量子多体系统的变分能量最小化问题转化为强化学习中的策略梯度优化问题,提出了一种名为PWO的稳定且可扩展的信任域算法,在保持计算效率的同时避免了矩阵求逆,并在各类自旋系统及高达15亿参数的模型中验证了其优势。
Neural quantum states (NQS) provide a flexible and scalable framework for approximating quantum many-body wavefunctions. Among NQS parameterizations, autoregressive models are especially attractive because they enable exact, independent sampling from the Born distribution, avoiding the autocorrelation and mixing issues of Markov chain methods. Yet their optimization remains comparatively underexplored: Adam is a scalable method but ignores function space geometry, while stochastic reconfiguration is principled but costly and numerically fragile in large models. To address this gap, we show that variational energy minimization can be viewed as an advantage policy-gradient problem over the Born distribution, motivating trust-region optimization for NQS training. We introduce Proximal Wavefunction Optimization (PWO), a principled trust-region algorithm that clips probability-ratio changes in the amplitude channel and phase increments in the phase channel. PWO avoids explicit matrix inversion, reuses samples across multiple updates, and combines the scalability of first-order optimization with theoretical guarantees. Across Ising and frustrated $J_1$-$J_2$ one- and two-dimensional spin systems, PWO improves stability and wall-clock convergence over Adam, minSR, and SPRING. Finally, we fine-tune a $1.5$B-parameter RWKV-7 model, demonstrating NQS optimization at a scale over three orders of magnitude beyond prior work.
再探:从强化学习视角重新审视神经量子态 / One More Time: Revisiting Neural Quantum States from a Reinforcement Learning Perspective
该论文将量子多体系统的变分能量最小化问题转化为强化学习中的策略梯度优化问题,提出了一种名为PWO的稳定且可扩展的信任域算法,在保持计算效率的同时避免了矩阵求逆,并在各类自旋系统及高达15亿参数的模型中验证了其优势。
源自 arXiv: 2607.02292