菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-18
📄 Abstract - The Implicit Bias of Adam and Muon on Smooth Homogeneous Neural Networks

We study the implicit bias of momentum-based optimizers on homogeneous models. We first extend existing results on the implicit bias of steepest descent in homogeneous models to normalized steepest descent with an optional learning rate schedule. We then show that for smooth homogeneous models, momentum steepest descent algorithms like Muon (spectral norm), MomentumGD ($\ell_2$ norm), and Signum ($\ell_\infty$ norm) are approximate steepest descent trajectories under a decaying learning rate schedule, proving that these algorithms too have a bias towards KKT points of the corresponding margin maximization problem. We extend the analysis to Adam (without the stability constant), which maximizes the $\ell_\infty$ margin, and to Muon-Signum and Muon-Adam, which maximize a hybrid norm. Our experiments corroborate the theory and show that the identity of the margin maximized depends on the choice of optimizer. Overall, our results extend earlier lines of work on steepest descent in homogeneous models and momentum-based optimizers in linear models.

顶级标签: theory model training machine learning
详细标签: implicit bias optimization homogeneous networks momentum margin maximization 或 搜索:

Adam和Muon优化器在平滑齐次神经网络上的隐式偏好 / The Implicit Bias of Adam and Muon on Smooth Homogeneous Neural Networks


1️⃣ 一句话总结

这篇论文通过理论分析和实验证明,在训练结构特殊的神经网络时,不同的动量优化算法(如Adam、Muon)会隐式地引导模型朝着不同几何意义上的“最优解”收敛,从而影响最终模型的性能。

源自 arXiv: 2602.16340