MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

📄 Abstract - MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

While the Transformer architecture dominates many fields, its quadratic self-attention complexity hinders its use in large-scale applications. Linear attention offers an efficient alternative, but its direct application often degrades performance, with existing fixes typically re-introducing computational overhead through extra modules (e.g., depthwise separable convolution) that defeat the original purpose. In this work, we identify a key failure mode in these methods: global context collapse, where the model loses representational diversity. To address this, we propose Multi-Head Linear Attention (MHLA), which preserves this diversity by computing attention within divided heads along the token dimension. We prove that MHLA maintains linear complexity while recovering much of the expressive power of softmax attention, and verify its effectiveness across multiple domains, achieving a 3.6\% improvement on ImageNet classification, a 6.3\% gain on NLP, a 12.6\% improvement on image generation, and a 41\% enhancement on video generation under the same time complexity.

MHLA：通过令牌级多头机制恢复线性注意力的表达能力 / MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

1️⃣ 一句话总结

这篇论文提出了一种名为MHLA的新型线性注意力机制，它通过将输入数据分成多个独立的“头”来分别计算注意力，从而在保持计算效率的同时，有效解决了传统线性注意力模型表达能力下降的问题，并在图像分类、自然语言处理、图像生成和视频生成等多个任务上取得了显著的性能提升。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要