循环神经网络与深度神经网络特征学习的统一理论 / A unified theory of feature learning in RNNs and DNNs
1️⃣ 一句话总结
这篇论文提出了一个统一的理论框架,解释了循环神经网络和深度神经网络在特征学习中的异同,揭示了权重共享如何使循环神经网络在处理序列任务时获得更好的泛化能力。
Recurrent and deep neural networks (RNNs/DNNs) are cornerstone architectures in machine learning. Remarkably, RNNs differ from DNNs only by weight sharing, as can be shown through unrolling in time. How does this structural similarity fit with the distinct functional properties these networks exhibit? To address this question, we here develop a unified mean-field theory for RNNs and DNNs in terms of representational kernels, describing fully trained networks in the feature learning ($\mu$P) regime. This theory casts training as Bayesian inference over sequences and patterns, directly revealing the functional implications induced by the RNNs' weight sharing. In DNN-typical tasks, we identify a phase transition when the learning signal overcomes the noise due to randomness in the weights: below this threshold, RNNs and DNNs behave identically; above it, only RNNs develop correlated representations across timesteps. For sequential tasks, the RNNs' weight sharing furthermore induces an inductive bias that aids generalization by interpolating unsupervised time steps. Overall, our theory offers a way to connect architectural structure to functional biases.
循环神经网络与深度神经网络特征学习的统一理论 / A unified theory of feature learning in RNNs and DNNs
这篇论文提出了一个统一的理论框架,解释了循环神经网络和深度神经网络在特征学习中的异同,揭示了权重共享如何使循环神经网络在处理序列任务时获得更好的泛化能力。
源自 arXiv: 2602.15593