加速广义线性预测的单次遍历随机梯度下降法 / Accelerating Single-Pass SGD for Generalized Linear Prediction
1️⃣ 一句话总结
这篇论文首次提出了一种在数据流式处理场景下,通过一种新颖的数据依赖近端方法成功引入动量机制,从而加速广义线性模型训练的算法,并证明了动量加速比方差缩减方法更有效。
We study generalized linear prediction under a streaming setting, where each iteration uses only one fresh data point for a gradient-level update. While momentum is well-established in deterministic optimization, a fundamental open question is whether it can accelerate such single-pass non-quadratic stochastic optimization. We propose the first algorithm that successfully incorporates momentum via a novel data-dependent proximal method, achieving dual-momentum acceleration. Our derived excess risk bound decomposes into three components: an improved optimization error, a minimax optimal statistical error, and a higher-order model-misspecification error. The proof handles mis-specification via a fine-grained stationary analysis of inner updates, while localizing statistical error through a two-phase outer-loop analysis. As a result, we resolve the open problem posed by Jain et al. [2018a] and demonstrate that momentum acceleration is more effective than variance reduction for generalized linear prediction in the streaming setting.
加速广义线性预测的单次遍历随机梯度下降法 / Accelerating Single-Pass SGD for Generalized Linear Prediction
这篇论文首次提出了一种在数据流式处理场景下,通过一种新颖的数据依赖近端方法成功引入动量机制,从而加速广义线性模型训练的算法,并证明了动量加速比方差缩减方法更有效。
源自 arXiv: 2603.01951