齐次神经网络中随机梯度下降的泛化界 / Generalization Bounds of Stochastic Gradient Descent in Homogeneous Neural Networks
1️⃣ 一句话总结
这篇论文证明了,在齐次神经网络(如使用ReLU激活的常见网络)中,随机梯度下降算法可以使用更慢的学习率衰减(如1/√t),而无需牺牲泛化性能,这比传统理论要求的更符合实际训练情况。
Algorithmic stability is among the most potent techniques in generalization analysis. However, its derivation usually requires a stepsize $\eta_t = \mathcal{O}(1/t)$ under non-convex training regimes, where $t$ denotes iterations. This rigid decay of the stepsize potentially impedes optimization and may not align with practical scenarios. In this paper, we derive the generalization bounds under the homogeneous neural network regimes, proving that this regime enables slower stepsize decay of order $\Omega(1/\sqrt{t})$ under mild assumptions. We further extend the theoretical results from several aspects, e.g., non-Lipschitz regimes. This finding is broadly applicable, as homogeneous neural networks encompass fully-connected and convolutional neural networks with ReLU and LeakyReLU activations.
齐次神经网络中随机梯度下降的泛化界 / Generalization Bounds of Stochastic Gradient Descent in Homogeneous Neural Networks
这篇论文证明了,在齐次神经网络(如使用ReLU激活的常见网络)中,随机梯度下降算法可以使用更慢的学习率衰减(如1/√t),而无需牺牲泛化性能,这比传统理论要求的更符合实际训练情况。
源自 arXiv: 2602.22936