菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-08
📄 Abstract - PyramidalWan: On Making Pretrained Video Model Pyramidal for Efficient Inference

Recently proposed pyramidal models decompose the conventional forward and backward diffusion processes into multiple stages operating at varying resolutions. These models handle inputs with higher noise levels at lower resolutions, while less noisy inputs are processed at higher resolutions. This hierarchical approach significantly reduces the computational cost of inference in multi-step denoising models. However, existing open-source pyramidal video models have been trained from scratch and tend to underperform compared to state-of-the-art systems in terms of visual plausibility. In this work, we present a pipeline that converts a pretrained diffusion model into a pyramidal one through low-cost finetuning, achieving this transformation without degradation in quality of output videos. Furthermore, we investigate and compare various strategies for step distillation within pyramidal models, aiming to further enhance the inference efficiency. Our results are available at this https URL.

顶级标签: video generation model training aigc
详细标签: diffusion models efficient inference pyramidal architecture model compression video synthesis 或 搜索:

PyramidalWan:将预训练视频模型改造为金字塔结构以实现高效推理 / PyramidalWan: On Making Pretrained Video Model Pyramidal for Efficient Inference


1️⃣ 一句话总结

这篇论文提出了一种低成本微调方法,能够将现有的预训练视频扩散模型高效地转化为金字塔结构模型,在保持生成视频质量的同时,显著降低了推理时的计算成本。

源自 arXiv: 2601.04792