菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-04
📄 Abstract - SparVAR: Exploring Sparsity in Visual AutoRegressive Modeling for Training-Free Acceleration

Visual AutoRegressive (VAR) modeling has garnered significant attention for its innovative next-scale prediction paradigm. However, mainstream VAR paradigms attend to all tokens across historical scales at each autoregressive step. As the next scale resolution grows, the computational complexity of attention increases quartically with resolution, causing substantial latency. Prior accelerations often skip high-resolution scales, which speeds up inference but discards high-frequency details and harms image quality. To address these problems, we present SparVAR, a training-free acceleration framework that exploits three properties of VAR attention: (i) strong attention sinks, (ii) cross-scale activation similarity, and (iii) pronounced locality. Specifically, we dynamically predict the sparse attention pattern of later high-resolution scales from a sparse decision scale, and construct scale self-similar sparse attention via an efficient index-mapping mechanism, enabling high-efficiency sparse attention computation at large scales. Furthermore, we propose cross-scale local sparse attention and implement an efficient block-wise sparse kernel, which achieves $\mathbf{> 5\times}$ faster forward speed than FlashAttention. Extensive experiments demonstrate that the proposed SparseVAR can reduce the generation time of an 8B model producing $1024\times1024$ high-resolution images to the 1s, without skipping the last scales. Compared with the VAR baseline accelerated by FlashAttention, our method achieves a $\mathbf{1.57\times}$ speed-up while preserving almost all high-frequency details. When combined with existing scale-skipping strategies, SparseVAR attains up to a $\mathbf{2.28\times}$ acceleration, while maintaining competitive visual generation quality. Code is available at this https URL.

顶级标签: computer vision model training model evaluation
详细标签: sparse attention autoregressive models inference acceleration visual generation high-resolution images 或 搜索:

SparVAR:探索视觉自回归建模中的稀疏性以实现免训练加速 / SparVAR: Exploring Sparsity in Visual AutoRegressive Modeling for Training-Free Acceleration


1️⃣ 一句话总结

这篇论文提出了一种名为SparVAR的免训练加速框架,它通过利用视觉自回归模型注意力机制中的稀疏特性,在不跳过高分辨率细节的情况下,显著提升了高分辨率图像生成的速度,同时保持了图像质量。

源自 arXiv: 2602.04361