理解有限维代数中的顿悟现象 / Grokking Finite-Dimensional Algebra
1️⃣ 一句话总结
这篇论文将神经网络训练中突然从记忆转向理解的‘顿悟’现象,从学习群运算推广到更广泛的有限维代数结构,揭示了代数性质(如交换律、结合律)和代数结构张量的特性如何影响顿悟的出现时机与泛化能力。
This paper investigates the grokking phenomenon, which refers to the sudden transition from a long memorization to generalization observed during neural networks training, in the context of learning multiplication in finite-dimensional algebras (FDA). While prior work on grokking has focused mainly on group operations, we extend the analysis to more general algebraic structures, including non-associative, non-commutative, and non-unital algebras. We show that learning group operations is a special case of learning FDA, and that learning multiplication in FDA amounts to learning a bilinear product specified by the algebra's structure tensor. For algebras over the reals, we connect the learning problem to matrix factorization with an implicit low-rank bias, and for algebras over finite fields, we show that grokking emerges naturally as models must learn discrete representations of algebraic elements. This leads us to experimentally investigate the following core questions: (i) how do algebraic properties such as commutativity, associativity, and unitality influence both the emergence and timing of grokking, (ii) how structural properties of the structure tensor of the FDA, such as sparsity and rank, influence generalization, and (iii) to what extent generalization correlates with the model learning latent embeddings aligned with the algebra's representation. Our work provides a unified framework for grokking across algebraic structures and new insights into how mathematical structure governs neural network generalization dynamics.
理解有限维代数中的顿悟现象 / Grokking Finite-Dimensional Algebra
这篇论文将神经网络训练中突然从记忆转向理解的‘顿悟’现象,从学习群运算推广到更广泛的有限维代数结构,揭示了代数性质(如交换律、结合律)和代数结构张量的特性如何影响顿悟的出现时机与泛化能力。
源自 arXiv: 2602.19533