动态线性注意力 / Dynamic Linear Attention
1️⃣ 一句话总结
本文提出了一种名为DLA的动态记忆建模框架,通过根据词语重要性自适应调整记忆状态的分界点,并在固定大小的缓存中智能合并低价值信息,显著提升了线性注意力机制处理长文本的准确性和效率。
The scalability of Large Language Models (LLMs) to long contexts is fundamentally constrained by the quadratic complexity of standard attention, motivating the adoption of linear attention mechanisms with sub-quadratic cost. To improve representation capacity under long contexts, recent approaches organize memory in a multi-state manner. However, existing multi-state linear attention methods rely on fixed state merging policies that cannot adapt to dynamically varying token importance, irreversibly obscuring critical tokens and causing severe error accumulation over long sequences. To address this limitation, we propose DLA, a dynamic memory modeling framework for multi-state linear attention. DLA introduces (i) Information-Aware Dynamic State Merging, which adaptively determines state boundaries based on token-level information variation, preserving high-resolution representations around semantic transitions while aggressively summarizing stable regions, and (ii) Capacity-Bounded Memory Modeling, which maintains a fixed-size, chronologically ordered state cache by selectively merging adjacent low-information states to control memory growth with minimal information loss. We pre-train DLA on two different linear attention models and evaluate on 16 datasets across three categories. Experimental results demonstrate the superiority of DLA over state-of-the-art.
动态线性注意力 / Dynamic Linear Attention
本文提出了一种名为DLA的动态记忆建模框架,通过根据词语重要性自适应调整记忆状态的分界点,并在固定大小的缓存中智能合并低价值信息,显著提升了线性注意力机制处理长文本的准确性和效率。
源自 arXiv: 2606.10650