增加缺失以减少偏差:面向缺失数据的Richardson随机梯度下降法 / Increasing Missingness to Reduce Bias: Richardson-SGD with Missing Data
1️⃣ 一句话总结
本文提出了一种反直觉的缺失数据处理方法——通过人为制造更多缺失值,利用Richardson外推技术消除随机梯度下降中的梯度偏差,从而显著提升含缺失数据场景下参数模型的优化精度和估计效果。
Stochastic gradient methods are central to modern large-scale learning, but their use with incomplete covariates remains delicate since imputation schemes generally introduce systematic gradient biases, as shown for linear models. In this work, we prove that all parametric models exhibit similar gradient bias for various imputation procedures and characterize exactly the dependence on the missingness ratio vector $p$, with $O(\|p\|)$ as the leading term. We exploit this analysis to propose a simple debiasing procedure for stochastic gradient descent (SGD) with missing values based on Richardson extrapolation, which leverages the exact expression of the gradient bias. The key idea is to \emph{deliberately add missingness}: from an already incomplete observation, we generate a further-thinned version at a higher, controlled missingness level, and combine the two resulting stochastic gradients to cancel the leading bias term. We prove that one Richardson step reduces the gradient bias from $O(\|p\|)$ to $O(\|p\|^2)$ under several missingness scenarios. Our proposed method is computationally efficient, model-agnostic and applies to any parametric loss whose stochastic gradient can be computed after imputation. Furthermore, when missing indicators are independent, the population gradient bias is a multilinear polynomial in $p$ and depends only on population gradient errors induced by declaring a single coordinate missing. In this case, our method generalizes to a multi-step Richardson procedure which recursively cancels higher-order terms. Empirically, Richardson debiasing improves optimization and estimation across several generalized linear models and combines positively with widely used imputation procedures such as MICE. These results suggest that, somewhat counter-intuitively, adding controlled missingness on top of existing missing data can make stochastic learning from incomplete data more accurate.
增加缺失以减少偏差:面向缺失数据的Richardson随机梯度下降法 / Increasing Missingness to Reduce Bias: Richardson-SGD with Missing Data
本文提出了一种反直觉的缺失数据处理方法——通过人为制造更多缺失值,利用Richardson外推技术消除随机梯度下降中的梯度偏差,从而显著提升含缺失数据场景下参数模型的优化精度和估计效果。
源自 arXiv: 2605.19641