On the Convergence of Stochastic Gradient Descent with Perturbed Forward-Backward Passes

📄 Abstract - On the Convergence of Stochastic Gradient Descent with Perturbed Forward-Backward Passes

We study stochastic gradient descent (SGD) for composite optimization problems with $N$ sequential operators subject to perturbations in both the forward and backward passes. Unlike classical analyses that treat gradient noise as additive and localized, perturbations to intermediate outputs and gradients cascade through the computational graph, compounding geometrically with the number of operators. We present the first comprehensive theoretical analysis of this setting. Specifically, we characterize how forward and backward perturbations propagate and amplify within a single gradient step, derive convergence guarantees for both general non-convex objectives and functions satisfying the Polyak--Łojasiewicz condition, and identify conditions under which perturbations do not deteriorate the asymptotic convergence order. As a byproduct, our analysis furnishes a theoretical explanation for the gradient spiking phenomenon widely observed in deep learning, precisely characterizing the conditions under which training recovers from spikes or diverges. Experiments on logistic regression with convex and non-convex regularization validate our theories, illustrating the predicted spike behavior and the asymmetric sensitivity to forward versus backward perturbations.

关于带有前向与后向扰动随机梯度下降的收敛性研究 / On the Convergence of Stochastic Gradient Descent with Perturbed Forward-Backward Passes

1️⃣ 一句话总结

这篇论文首次系统分析了深度学习训练中，前向和后向计算过程同时存在扰动时随机梯度下降的收敛性，解释了训练中常见的梯度尖峰现象，并给出了扰动不影响最终收敛速度的条件。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要