用于快速视频生成的过渡匹配蒸馏 / Transition Matching Distillation for Fast Video Generation
1️⃣ 一句话总结
这项研究提出了一种名为‘过渡匹配蒸馏’的新方法,它通过将大型视频扩散模型的知识压缩到轻量级条件流模型中,在保持视频生成质量的同时,大幅提升了生成速度,使其更适用于实时交互应用。
Large video diffusion and flow models have achieved remarkable success in high-quality video generation, but their use in real-time interactive applications remains limited due to their inefficient multi-step sampling process. In this work, we present Transition Matching Distillation (TMD), a novel framework for distilling video diffusion models into efficient few-step generators. The central idea of TMD is to match the multi-step denoising trajectory of a diffusion model with a few-step probability transition process, where each transition is modeled as a lightweight conditional flow. To enable efficient distillation, we decompose the original diffusion backbone into two components: (1) a main backbone, comprising the majority of early layers, that extracts semantic representations at each outer transition step; and (2) a flow head, consisting of the last few layers, that leverages these representations to perform multiple inner flow updates. Given a pretrained video diffusion model, we first introduce a flow head to the model, and adapt it into a conditional flow map. We then apply distribution matching distillation to the student model with flow head rollout in each transition step. Extensive experiments on distilling Wan2.1 1.3B and 14B text-to-video models demonstrate that TMD provides a flexible and strong trade-off between generation speed and visual quality. In particular, TMD outperforms existing distilled models under comparable inference costs in terms of visual fidelity and prompt adherence. Project page: this https URL
用于快速视频生成的过渡匹配蒸馏 / Transition Matching Distillation for Fast Video Generation
这项研究提出了一种名为‘过渡匹配蒸馏’的新方法,它通过将大型视频扩散模型的知识压缩到轻量级条件流模型中,在保持视频生成质量的同时,大幅提升了生成速度,使其更适用于实时交互应用。
源自 arXiv: 2601.09881