菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-17
📄 Abstract - UniTemp: Unlocking Video Generation in Any Temporal Order via Bidirectional Distillation

Autoregressive video diffusion models have emerged as a promising approach for long video generation, achieving strong performance in streaming settings. However, existing methods are restricted to forward temporal generation, whereas practical video creation often requires flexible generation order, e.g., conditioning on future context to extend backward, or on both past and future context for inbetween generation. We bridge this gap by training an autoregressive model that supports generation in arbitrary temporal directions. A key technical challenge arises from the Causal 3D VAE widely used in video diffusion models, which encodes latents strictly conditioned on past context. While suited for forward generation, this causal structure causes inter-block discontinuities when generation proceeds backward. To address this, we introduce blockwise anchor latents, a set of auxiliary latents that restore the missing past context at block boundaries during backward generation. Built on this design, we propose UniTemp, a bidirectional distillation framework that trains a single autoregressive student model for any-direction video generation. At inference time, UniTemp conditions on arbitrary past and/or future frames, improving controllability for both bidirectional and inbetween generation. Experiments show that UniTemp maintains competitive performance on short and long video generation compared to forward-only methods, while enabling diverse workflows such as bidirectional video extension, inbetween generation, looping video generation, scene transition, and visual story generation. Project website: this https URL

顶级标签: video generation model training multi-modal
详细标签: autoregressive diffusion bidirectional generation temporal ordering video inbetweening distillation 或 搜索:

UniTemp:通过双向蒸馏实现任意时间顺序的视频生成 / UniTemp: Unlocking Video Generation in Any Temporal Order via Bidirectional Distillation


1️⃣ 一句话总结

这篇论文提出了一种名为UniTemp的方法,通过双向蒸馏技术训练单个模型,让视频生成不再局限于从前到后的顺序,而是可以任意向前、向后或插入式地生成,从而更灵活地满足实际视频创作需求。

源自 arXiv: 2606.18702