菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-18
📄 Abstract - Perfect Parallelization in Mini-Batch SGD with Classical Momentum Acceleration

Accelerating stochastic gradient methods with classical momentum schemes, such as Polyak's heavy ball, has proven highly successful in training large-scale machine learning models, particularly when combined with the hardware acceleration of large mini-batch computations. Yet, the effect of classical momentum on stochastic mini-batch optimization has been poorly understood theoretically, with prior works requiring strong noise assumptions and extremely large mini-batches. In this work, we develop a general theory of stochastic momentum acceleration for optimizing over quadratics in the interpolation regime, a popular abstraction for studying deep learning dynamics which also includes classical methods such as randomized Kaczmarz and coordinate descent. Our framework encompasses both heavy ball and Nesterov-style momentum, allows for arbitrary mini-batch sizes, and makes minimal assumptions on the stochastic noise. In particular, we show that acceleration from classical momentum is directly proportional to the gradient mini-batch size (up to a natural saturation point), thereby enabling perfect parallelization of mini-batch computations. Our theory also provides a simple choice for the momentum parameter, which is shown to be effective empirically.

顶级标签: machine learning theory
详细标签: stochastic gradient descent momentum acceleration mini-batch optimization parallelization 或 搜索:

小批量随机梯度下降中经典动量加速的完美并行化 / Perfect Parallelization in Mini-Batch SGD with Classical Momentum Acceleration


1️⃣ 一句话总结

本文提出了一种通用理论,证明经典动量(如Polyak重球和Nesterov动量)在二次型优化问题中的加速效果与小批量梯度的大小成正比,从而实现了小批量计算中的完美并行化,并给出了一种简单有效的动量参数选择方法。

源自 arXiv: 2605.18609