衔尾蛇:通过输入条件化的LoRA调制实现递归变换器的动态权重生成 / Ouroboros: Dynamic Weight Generation for Recursive Transformers via Input-Conditioned LoRA Modulation
1️⃣ 一句话总结
这篇论文提出了一种名为‘衔尾蛇’的新方法,通过一个轻量级的控制器网络,让递归神经网络中的共享权重模块在每次循环时都能根据当前输入动态调整,从而显著提升了模型性能,同时只增加了很少的可训练参数。
Recursive transformers reuse a shared weight block across multiple depth steps, trading parameters for compute. A core limitation: every step applies the same transformation, preventing the model from composing distinct operations across depth. We present Ouroboros, a system that attaches a compact Controller hypernetwork to a recursive transformer block. The Controller observes the current hidden state, produces a per-step diagonal modulation vector, and applies it to frozen SVD-initialized LoRA bases, making each recurrence step input-dependent. We combine this with gated recurrence (bias-initialized to 88% retention) and per-step LayerNorm for stable deep iteration. On Qwen2.5-3B split into a Prelude/Recurrent/Coda architecture (17 of 36 layers retained), Ouroboros reduces training loss by 43.4% over the unmodified 17-layer baseline, recovering 51.3% of the performance gap caused by layer removal. The full system adds only 9.2M trainable parameters (Controller, gate, and per-step norms) yet outperforms equivalently-sized static per-step LoRA by 1.44 loss points at depth 1 and remains ahead across all tested depths (1, 4, 8, 16) and ranks (8, 32, 64). We also find that gated recurrence is essential: without it, recursive layer application makes the model strictly worse. These gains are measured on the training distribution; on held-out text, the Controller does not yet improve over the baseline, a limitation we attribute to frozen downstream layers and discuss in detail. Code: this https URL
衔尾蛇:通过输入条件化的LoRA调制实现递归变换器的动态权重生成 / Ouroboros: Dynamic Weight Generation for Recursive Transformers via Input-Conditioned LoRA Modulation
这篇论文提出了一种名为‘衔尾蛇’的新方法,通过一个轻量级的控制器网络,让递归神经网络中的共享权重模块在每次循环时都能根据当前输入动态调整,从而显著提升了模型性能,同时只增加了很少的可训练参数。
源自 arXiv: 2604.02051