面向不平衡、多保真度原子数据的百亿亿级多任务图基础模型 / Exascale Multi-Task Graph Foundation Models for Imbalanced, Multi-Fidelity Atomistic Data
1️⃣ 一句话总结
这篇论文开发了一个基于图神经网络的大规模材料发现工作流,通过联合训练覆盖85种元素的5亿多个原子结构数据,创建了一个能快速筛选数十亿候选材料、并支持多种下游任务微调的基础模型,极大加速了传统计算方法难以企及的化学设计空间探索。
We present an exascale workflow for materials discovery using atomistic graph foundation models built on HydraGNN. We jointly train on 16 open first-principles datasets (544+ million structures covering 85+ elements) using a multi-task architecture with per-dataset heads and a scalable ADIOS2/DDStore data pipeline. On Frontier, we execute six large-scale DeepHyper hyperparameter optimization campaigns in FP64 and promote the top-performing message-passing models to sustained 2,048-node training, yielding a PaiNN-based lead model. The resulting model enables billion-scale screening, evaluating 1.1 billion atomistic structures in 50 seconds, compressing a workload that would require years of first-principles computation, and supports data-scarce fine-tuning across diverse downstream tasks. We quantify precision-performance tradeoffs (BF16/FP32/FP64), demonstrate transfer across twelve chemically diverse downstream tasks, and establish seamless strong- and weak-scaling across Frontier, Aurora, and Perlmutter. This work allows fast and reliable exploration of vast chemical design spaces that are otherwise inaccessible to first-principles methods.
面向不平衡、多保真度原子数据的百亿亿级多任务图基础模型 / Exascale Multi-Task Graph Foundation Models for Imbalanced, Multi-Fidelity Atomistic Data
这篇论文开发了一个基于图神经网络的大规模材料发现工作流,通过联合训练覆盖85种元素的5亿多个原子结构数据,创建了一个能快速筛选数十亿候选材料、并支持多种下游任务微调的基础模型,极大加速了传统计算方法难以企及的化学设计空间探索。
源自 arXiv: 2604.15380