AdaCluster:面向视频生成中稀疏注意力机制的自适应查询-键聚类方法 / AdaCluster: Adaptive Query-Key Clustering for Sparse Attention in Video Generation
1️⃣ 一句话总结
该论文提出了一种无需训练的智能聚类方法AdaCluster,通过为查询向量和键向量分别设计不同的相似度聚类算法,并动态调整聚类数量和重点区域,在保证视频生成质量的前提下,将现有视频扩散模型的推理速度提升了1.6至4.3倍。
Video diffusion transformers (DiTs) suffer from prohibitive inference latency due to quadratic attention complexity. Existing sparse attention methods either overlook semantic similarity or fail to adapt to heterogeneous token distributions across layers, leading to model performance degradation. We propose AdaCluster, a training-free adaptive clustering framework that accelerates the generation of DiTs while preserving accuracy. AdaCluster applies an angle-similarity-preserving clustering method to query vectors for higher compression, and designs a euclidean-similarity-preserving clustering method for keys, covering cluster number assignment, threshold-wise adaptive clustering, and efficient critical cluster selection. Experiments on CogVideoX-2B, HunyuanVideo, and Wan-2.1 on one A40 GPU demonstrate up to 1.67-4.31x speedup with negligible quality degradation.
AdaCluster:面向视频生成中稀疏注意力机制的自适应查询-键聚类方法 / AdaCluster: Adaptive Query-Key Clustering for Sparse Attention in Video Generation
该论文提出了一种无需训练的智能聚类方法AdaCluster,通过为查询向量和键向量分别设计不同的相似度聚类算法,并动态调整聚类数量和重点区域,在保证视频生成质量的前提下,将现有视频扩散模型的推理速度提升了1.6至4.3倍。
源自 arXiv: 2604.18348