全球天气模型的缩放定律 / Scaling Laws of Global Weather Models
1️⃣ 一句话总结
这篇论文通过分析数据驱动天气模型的训练规律,发现与语言模型不同,天气模型通过增加模型宽度和延长训练时间比单纯堆叠深度更能有效提升预测性能,为优化未来天气模型设计提供了关键指导。
Data-driven models are revolutionizing weather forecasting. To optimize training efficiency and model performance, this paper analyzes empirical scaling laws within this domain. We investigate the relationship between model performance (validation loss) and three key factors: model size ($N$), dataset size ($D$), and compute budget ($C$). Across a range of models, we find that Aurora exhibits the strongest data-scaling behavior: increasing the training dataset by 10x reduces validation loss by up to 3.2x. GraphCast demonstrates the highest parameter efficiency, yet suffers from limited hardware utilization. Our compute-optimal analysis indicates that, under fixed compute budgets, allocating resources to longer training durations yields greater performance gains than increasing model size. Furthermore, we analyze model shape and uncover scaling behaviors that differ fundamentally from those observed in language models: weather forecasting models consistently favor increased width over depth. These findings suggest that future weather models should prioritize wider architectures and larger effective training datasets to maximize predictive performance.
全球天气模型的缩放定律 / Scaling Laws of Global Weather Models
这篇论文通过分析数据驱动天气模型的训练规律,发现与语言模型不同,天气模型通过增加模型宽度和延长训练时间比单纯堆叠深度更能有效提升预测性能,为优化未来天气模型设计提供了关键指导。
源自 arXiv: 2602.22962