📄
Abstract - Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics
Linear-time attention and State Space Models (SSMs) promise to solve the quadratic cost bottleneck in long-context language models employing softmax attention. We introduce Error-Free Linear Attention (EFLA), a numerically stable, fully parallelism and generalized formulation of the delta rule. Specifically, we formulate the online learning update as a continuous-time dynamical system and prove that its exact solution is not only attainable but also computable in linear time with full parallelism. By leveraging the rank-1 structure of the dynamics matrix, we directly derive the exact closed-form solution effectively corresponding to the infinite-order Runge-Kutta method. This attention mechanism is theoretically free from error accumulation, perfectly capturing the continuous dynamics while preserving the linear-time complexity. Through an extensive suite of experiments, we show that EFLA enables robust performance in noisy environments, achieving lower language modeling perplexity and superior downstream benchmark performance than DeltaNet without introducing additional parameters. Our work provides a new theoretical foundation for building high-fidelity, scalable linear-time attention models.
无误差线性注意力是免费午餐:来自连续时间动力学的精确解 /
Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics
1️⃣ 一句话总结
这篇论文提出了一种名为EFLA的新型线性注意力机制,它通过将在线学习过程建模为连续时间动态系统,并巧妙地利用其矩阵结构,首次实现了在保持线性计算复杂度的同时,获得了完全精确、无误差积累的解,从而在理论上和实验上都显著提升了长文本建模的性能。