菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-28
📄 Abstract - Anti-Collapse Dynamics and the Emergence of Multi-Time-Scale Learning in Recurrent Neural Networks

Long-range learning is hard for recurrent networks trained with stochastic gradient descent, because the influence of a past input fades with the lag $\ell$, and if it fades too fast the dependence cannot be learned from finite data. This fade is captured by an envelope $f(\ell)$. An exponential fade makes the data needed to learn a lag-$\ell$ dependence grow exponentially, putting long horizons out of reach; a power-law fade keeps the cost polynomial. We show that the asymptotic decay class of $f(\ell)$ is not fixed by the architecture. Instead, it emerges from the coupling between the state dynamics and parameter dynamics, settling into either a collapsed regime (fast, exponential forgetting) or an extended, anti-collapsed regime (slow, power-law forgetting). The intuition is a competition within these coupled dynamics. Training drives the network's effective time scales toward short ones, while rare, heavy-tailed fluctuations of the learning dynamics push a few of them to very long values. The extended regime survives only when these heavy-tailed pushes are strong enough to balance the pull. We make this mathematically precise with a coarse-grained stochastic process and prove exactly when the extended regime exists. A single exponent, the spectral exponent~$\beta$, then governs both the spread of time scales and how slowly the network forgets. Realizing the regime in practice needs one more ingredient: the joint action of the architecture and the optimizer must be able to hold such a broad spread. A network whose capacity to generate broad time-scale spectra is severely constrained still collapses, even when supplied with strong heavy-tailed forcing. Heavy-tailed fluctuations thus act not as noise to be suppressed, but as the mechanism that sustains long-range learning.

顶级标签: machine learning theory
详细标签: recurrent neural networks long-range learning anti-collapse dynamics spectral exponent heavy-tailed fluctuations 或 搜索:

递归神经网络中的抗塌缩动力学与多时间尺度学习的涌现 / Anti-Collapse Dynamics and the Emergence of Multi-Time-Scale Learning in Recurrent Neural Networks


1️⃣ 一句话总结

本文发现,递归神经网络在训练中会出现两种截然不同的学习状态:一种是常见的“塌缩”状态,网络只能快速遗忘过去信息,导致无法学习长期依赖;另一种是少见的“抗塌缩”状态,网络能维持缓慢的幂律遗忘,从而有效处理长时间跨度的任务,而这种状态的实现依赖于训练过程中罕见的“重尾”波动来平衡参数更新带来的时间尺度收缩。

源自 arXiv: 2606.29519