菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-10
📄 Abstract - Routing without Forgetting

Continual learning in transformers is commonly addressed through parameter-efficient adaptation: prompts, adapters, or LoRA modules are specialized per task while the backbone remains frozen. Although effective in controlled multi-epoch settings, these approaches rely on gradual gradient-based specialization and struggle in Online Continual Learning (OCL), where data arrive as a non-stationary stream and each sample may be observed only once. We recast continual learning in transformers as a routing problem: under strict online constraints, the model must dynamically select the appropriate representational subspace for each input without explicit task identifiers or repeated optimization. We thus introduce Routing without Forgetting (RwF), a transformer architecture augmented with energy-based associative retrieval layers inspired by Modern Hopfield Networks. Instead of storing or merging task-specific prompts, RwF generates dynamic prompts through single-step associative retrieval over the transformer token embeddings at each layer. Retrieval corresponds to the closed-form minimization of a strictly convex free-energy functional, enabling input-conditioned routing within each forward pass, independently of iterative gradient refinement. Across challenging class-incremental benchmarks, RwF improves over existing prompt-based methods. On Split-ImageNet-R and Split-ImageNet-S, RwF outperforms prior prompt-based approaches by a large margin, even in few-shot learning regimes. These results indicate that embedding energy-based associative routing directly within the transformer backbone provides a principled and effective foundation for OCL.

顶级标签: machine learning model training theory
详细标签: continual learning online learning transformers hopfield networks parameter-efficient adaptation 或 搜索:

无遗忘路由 / Routing without Forgetting


1️⃣ 一句话总结

这篇论文提出了一种名为‘无遗忘路由’的新方法,通过在Transformer模型中引入基于能量的联想检索层,让模型能够在只看到一次数据的情况下,动态地为每个输入选择最合适的处理路径,从而有效解决了在线持续学习中的灾难性遗忘问题。

源自 arXiv: 2603.09576