SVD-Prune:一种无需训练的高效视觉语言模型令牌剪枝方法 / SVD-Prune: Training-Free Token Pruning For Efficient Vision-Language Models
1️⃣ 一句话总结
这篇论文提出了一种名为SVD-Prune的新方法,它无需额外训练,就能像筛子一样自动筛选出图像中最关键的信息片段,从而让视觉语言模型在保持高性能的同时,大幅降低计算和内存开销,尤其是在处理信息丰富的图像时效果显著。
Vision-Language Models (VLM) have revolutionized multimodal learning by jointly processing visual and textual information. Yet, they face significant challenges due to the high computational and memory demands of processing long sequences of vision tokens. Many existing methods rely on local heuristics, such as attention scores or token norms. However, these criteria suffer from positional bias and information dispersion, limiting their ability to preserve essential content at high pruning ratios and leading to performance degradation on visually detailed images. To address these issues, we propose SVD-Prune, a trainingfree, plug-and-play token pruning method based on Singular Value Decomposition. It decomposes the vision token feature matrix and selects the top-K tokens using statistical leverage scores, ensuring only tokens contributing most to the dominant global variance are preserved. Experiments show that SVD-Prune consistently outperforms prior pruning methods under extreme vision token budgets, maintaining strong performance even with 32 and 16 vision tokens.
SVD-Prune:一种无需训练的高效视觉语言模型令牌剪枝方法 / SVD-Prune: Training-Free Token Pruning For Efficient Vision-Language Models
这篇论文提出了一种名为SVD-Prune的新方法,它无需额外训练,就能像筛子一样自动筛选出图像中最关键的信息片段,从而让视觉语言模型在保持高性能的同时,大幅降低计算和内存开销,尤其是在处理信息丰富的图像时效果显著。
源自 arXiv: 2604.11530