菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-15
📄 Abstract - Exascale Multi-Task Graph Foundation Models for Imbalanced, Multi-Fidelity Atomistic Data

We present an exascale workflow for materials discovery using atomistic graph foundation models built on HydraGNN. We jointly train on 16 open first-principles datasets (544+ million structures covering 85+ elements) using a multi-task architecture with per-dataset heads and a scalable ADIOS2/DDStore data pipeline. On Frontier, we execute six large-scale DeepHyper hyperparameter optimization campaigns in FP64 and promote the top-performing message-passing models to sustained 2,048-node training, yielding a PaiNN-based lead model. The resulting model enables billion-scale screening, evaluating 1.1 billion atomistic structures in 50 seconds, compressing a workload that would require years of first-principles computation, and supports data-scarce fine-tuning across diverse downstream tasks. We quantify precision-performance tradeoffs (BF16/FP32/FP64), demonstrate transfer across twelve chemically diverse downstream tasks, and establish seamless strong- and weak-scaling across Frontier, Aurora, and Perlmutter. This work allows fast and reliable exploration of vast chemical design spaces that are otherwise inaccessible to first-principles methods.

顶级标签: model training systems machine learning
详细标签: graph neural networks materials discovery multi-task learning exascale computing hyperparameter optimization 或 搜索:

面向不平衡、多保真度原子数据的百亿亿级多任务图基础模型 / Exascale Multi-Task Graph Foundation Models for Imbalanced, Multi-Fidelity Atomistic Data


1️⃣ 一句话总结

这篇论文开发了一个基于图神经网络的大规模材料发现工作流,通过联合训练覆盖85种元素的5亿多个原子结构数据,创建了一个能快速筛选数十亿候选材料、并支持多种下游任务微调的基础模型,极大加速了传统计算方法难以企及的化学设计空间探索。

源自 arXiv: 2604.15380