菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-16
📄 Abstract - MONET: Modeling and Optimization of neural NEtwork Training from Edge to Data Centers

While hardware-software co-design has significantly improved the efficiency of neural network inference, modeling the training phase remains a critical yet underexplored challenge. Training workloads impose distinct constraints, particularly regarding memory footprint and backpropagation complexity, which existing inference-focused tools fail to capture. This paper introduces MONET, a framework designed to model the training of neural networks on heterogeneous dataflow accelerators. MONET builds upon Stream, an experimentally verified framework that that models the inference of neural networks on heterogeneous dataflow accelerators with layer fusion. Using MONET, we explore the design space of ResNet-18 and a small GPT-2, demonstrating the framework's capability to model training workflows and find better hardware architectures. We then further examine problems that become more complex in neural network training due to the larger design space, such as determining the best layer-fusion configuration. Additionally, we use our framework to find interesting trade-offs in activation checkpointing, with the help of a genetic algorithm. Our findings highlight the importance of a holistic approach to hardware-software co-design for scalable and efficient deep learning deployment.

顶级标签: systems model training machine learning
详细标签: hardware-software co-design training optimization neural network training dataflow accelerators design space exploration 或 搜索:

MONET:从边缘到数据中心神经网络训练的建模与优化 / MONET: Modeling and Optimization of neural NEtwork Training from Edge to Data Centers


1️⃣ 一句话总结

这篇论文提出了一个名为MONET的新框架,专门用于模拟和优化神经网络在异构硬件上的训练过程,解决了现有工具只关注推理而忽略训练独特挑战的问题,并通过案例展示了如何用它找到更高效的硬件架构和配置方案。

源自 arXiv: 2603.15002