逻辑回归线性模型中的顿悟现象研究 / Grokking in Linear Models for Logistic Regression
1️⃣ 一句话总结
这篇论文发现,即使是最简单的线性逻辑回归模型,在特定测试数据(如集中于分类边界附近或对抗性数据)下也会出现‘顿悟’现象,即模型在训练后期才突然学会泛化,并揭示这种现象源于梯度下降的内在偏好和数据分布的不对称性,而非深度神经网络所独有。
Grokking, the phenomenon of delayed generalization, is often attributed to the depth and compositional structure of deep neural networks. We study grokking in one of the simplest possible settings: the learning of a linear model with logistic loss for binary classification on data that are linearly (and max margin) separable about the origin. We investigate three testing regimes: (1) test data drawn from the same distribution as the training data, in which case grokking is not observed; (2) test data concentrated around the margin, in which case grokking is observed; and (3) adversarial test data generated via projected gradient descent (PGD) attacks, in which case grokking is also observed. We theoretically show that the implicit bias of gradient descent induces a three-phase learning process-population-dominated, support-vector-dominated unlearning, and support-vector-dominated generalization-during which delayed generalization can arise. Our analysis further relates the emergence of grokking to asymmetries in the data, both in the number of examples per class and in the distribution of support vectors across classes, and yields a characterization of the grokking time. We experimentally validate our theory by planting different distributions of population points and support vectors, and by analyzing accuracy curves and hyperplane dynamics. Overall, our results demonstrate that grokking does not require depth or representation learning, and can emerge even in linear models through the dynamics of the bias term.
逻辑回归线性模型中的顿悟现象研究 / Grokking in Linear Models for Logistic Regression
这篇论文发现,即使是最简单的线性逻辑回归模型,在特定测试数据(如集中于分类边界附近或对抗性数据)下也会出现‘顿悟’现象,即模型在训练后期才突然学会泛化,并揭示这种现象源于梯度下降的内在偏好和数据分布的不对称性,而非深度神经网络所独有。
源自 arXiv: 2602.08302