凸与非凸联邦学习中陈旧随机梯度的处理:递减步长是唯一所需 / Convex and Non-convex Federated Learning with Stale Stochastic Gradients: Diminishing Step Size is All You Need
1️⃣ 一句话总结
这篇论文证明,在分布式联邦学习场景中,当各参与方的梯度信息可能存在延迟、偏差或陈旧时,使用预先设定好的递减步长策略,就能达到与复杂自适应步长方法相同的优化效果,并且适用于非凸和强凸目标函数。
We propose a general framework for distributed stochastic optimization under delayed gradient models. In this setting, $n$ local agents leverage their own data and computation to assist a central server in minimizing a global objective composed of agents' local cost functions. Each agent is allowed to transmit stochastic-potentially biased and delayed-estimates of its local gradient. While a prior work has advocated delay-adaptive step sizes for stochastic gradient descent (SGD) in the presence of delays, we demonstrate that a pre-chosen diminishing step size is sufficient and matches the performance of the adaptive scheme. Moreover, our analysis establishes that diminishing step sizes recover the optimal SGD rates for nonconvex and strongly convex objectives.
凸与非凸联邦学习中陈旧随机梯度的处理:递减步长是唯一所需 / Convex and Non-convex Federated Learning with Stale Stochastic Gradients: Diminishing Step Size is All You Need
这篇论文证明,在分布式联邦学习场景中,当各参与方的梯度信息可能存在延迟、偏差或陈旧时,使用预先设定好的递减步长策略,就能达到与复杂自适应步长方法相同的优化效果,并且适用于非凸和强凸目标函数。
源自 arXiv: 2603.02639