为AI腾出空间:在GROMACS中实现基于深度势能的多GPU分子动力学模拟 / Making Room for AI: Multi-GPU Molecular Dynamics with Deep Potentials in GROMACS
1️⃣ 一句话总结
这项研究成功将AI驱动的深度势能模型集成到主流分子动力学软件GROMACS中,实现了在多GPU系统上的高效并行计算,使得以接近量子精度的准确性进行大规模分子模拟成为可能。
GROMACS is a de-facto standard for classical Molecular Dynamics (MD). The rise of AI-driven interatomic potentials that pursue near-quantum accuracy at MD throughput now poses a significant challenge: embedding neural-network inference into multi-GPU simulations retaining high-performance. In this work, we integrate the MLIP framework DeePMD-kit into GROMACS, enabling domain-decomposed, GPU-accelerated inference across multi-node systems. We extend the GROMACS NNPot interface with a DeePMD backend, and we introduce a domain decomposition layer decoupled from the main simulation. The inference is executed concurrently on all processes, with two MPI collectives used each step to broadcast coordinates and to aggregate and redistribute forces. We train an in-house DPA-1 model (1.6 M parameters) on a dataset of solvated protein fragments. We validate the implementation on a small protein system, then we benchmark the GROMACS-DeePMD integration with a 15,668 atom protein on NVIDIA A100 and AMD MI250x GPUs up to 32 devices. Strong-scaling efficiency reaches 66% at 16 devices and 40% at 32; weak-scaling efficiency is 80% to 16 devices and reaches 48% (MI250x) and 40% (A100) at 32 devices. Profiling with the ROCm System profiler shows that >90% of the wall time is spent in DeePMD inference, while MPI collectives contribute <10%, primarily since they act as a global synchronization point. The principal bottlenecks are the irreducible ghost-atom cost set by the cutoff radius, confirmed by a simple throughput model, and load imbalance across ranks. These results demonstrate that production MD with near ab initio fidelity is feasible at scale in GROMACS.
为AI腾出空间:在GROMACS中实现基于深度势能的多GPU分子动力学模拟 / Making Room for AI: Multi-GPU Molecular Dynamics with Deep Potentials in GROMACS
这项研究成功将AI驱动的深度势能模型集成到主流分子动力学软件GROMACS中,实现了在多GPU系统上的高效并行计算,使得以接近量子精度的准确性进行大规模分子模拟成为可能。
源自 arXiv: 2604.07276