菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-21
📄 Abstract - Rethinking Token Reduction for Diffusion Models via Output-Similarity-Awareness

Diffusion Transformers (DiTs) achieve superior image generation quality but suffer from quadratic computational complexity relative to token count. While various token reduction (TR) methods have been proposed to mitigate this cost, they overlook the primary objective of generative models: minimizing recovery error, which requires reflecting output token similarity. They rely solely on input token similarity inherited from reduction-only ViT paradigms, leading to a fundamental misalignment with this objective. To bridge this gap, we propose DiTo, a novel TR paradigm that shifts the focus toward output-centric token reduction. Based on the observation that output token similarity is consistently preserved across adjacent timesteps, DiTo utilizes prior-step similarities as an effective proxy to establish token correspondences at a Matching timestep, which are then reused across multiple subsequent Reduction timesteps. To optimize this interleaved scheduling, we propose Pair Match Ratio (PMR)-guided Interval Scheduling to determine the optimal matching frequency. Furthermore, to mitigate localized approximation errors and resulting blocking artifacts caused by repeated reuse, we propose Frequency-aware Token Matching by incorporating a selection-frequency penalty. Extensive experiments demonstrate that DiTo consistently outperforms existing TR methods with 1.6-3.9 dB higher PSNR at comparable speedups, achieving a superior Pareto frontier.

顶级标签: computer vision model training machine learning
详细标签: diffusion transformers token reduction output similarity pair match ratio scheduling 或 搜索:

重新思考扩散模型的令牌精简:基于输出相似性感知的方法 / Rethinking Token Reduction for Diffusion Models via Output-Similarity-Awareness


1️⃣ 一句话总结

本文提出一种名为DiTo的令牌精简新方法,通过利用扩散模型相邻时间步之间输出令牌相似性稳定的特点,用先前时间步的相似性来指导后续时间步的令牌合并,从而在显著降低计算成本的同时,比传统方法生成更高质量的图像。

源自 arXiv: 2605.22011