深度增量学习 / Deep Delta Learning
1️⃣ 一句话总结
这篇论文提出了一种名为‘深度增量学习’的新网络架构,它通过一个可学习的‘增量算子’动态调整残差连接,使网络不仅能稳定训练,还能更灵活地建模复杂的特征变换,从而超越了传统残差网络只能做简单加法更新的限制。
The efficacy of deep residual networks is fundamentally predicated on the identity shortcut connection. While this mechanism effectively mitigates the vanishing gradient problem, it imposes a strictly additive inductive bias on feature transformations, thereby limiting the network's capacity to model complex state transitions. In this paper, we introduce Deep Delta Learning (DDL), a novel architecture that generalizes the standard residual connection by modulating the identity shortcut with a learnable, data-dependent geometric transformation. This transformation, termed the Delta Operator, constitutes a rank-1 perturbation of the identity matrix, parameterized by a reflection direction vector $\mathbf{k}(\mathbf{X})$ and a gating scalar $\beta(\mathbf{X})$. We provide a spectral analysis of this operator, demonstrating that the gate $\beta(\mathbf{X})$ enables dynamic interpolation between identity mapping, orthogonal projection, and geometric reflection. Furthermore, we restructure the residual update as a synchronous rank-1 injection, where the gate acts as a dynamic step size governing both the erasure of old information and the writing of new features. This unification empowers the network to explicitly control the spectrum of its layer-wise transition operator, enabling the modeling of complex, non-monotonic dynamics while preserving the stable training characteristics of gated residual architectures.
深度增量学习 / Deep Delta Learning
这篇论文提出了一种名为‘深度增量学习’的新网络架构,它通过一个可学习的‘增量算子’动态调整残差连接,使网络不仅能稳定训练,还能更灵活地建模复杂的特征变换,从而超越了传统残差网络只能做简单加法更新的限制。
源自 arXiv: 2601.00417