菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-15
📄 Abstract - DiffusionBrowser: Interactive Diffusion Previews via Multi-Branch Decoders

Video diffusion models have revolutionized generative video synthesis, but they are imprecise, slow, and can be opaque during generation -- keeping users in the dark for a prolonged period. In this work, we propose DiffusionBrowser, a model-agnostic, lightweight decoder framework that allows users to interactively generate previews at any point (timestep or transformer block) during the denoising process. Our model can generate multi-modal preview representations that include RGB and scene intrinsics at more than 4$\times$ real-time speed (less than 1 second for a 4-second video) that convey consistent appearance and motion to the final video. With the trained decoder, we show that it is possible to interactively guide the generation at intermediate noise steps via stochasticity reinjection and modal steering, unlocking a new control capability. Moreover, we systematically probe the model using the learned decoders, revealing how scene, object, and other details are composed and assembled during the otherwise black-box denoising process.

顶级标签: video generation model training aigc
详细标签: diffusion models interactive preview multi-modal decoding video synthesis denoising process 或 搜索:

DiffusionBrowser:通过多分支解码器实现交互式扩散预览 / DiffusionBrowser: Interactive Diffusion Previews via Multi-Branch Decoders


1️⃣ 一句话总结

这篇论文提出了一个名为DiffusionBrowser的通用解码框架,它能让用户在视频生成过程中随时交互式地预览结果,不仅速度快、能显示多种画面信息,还揭示了AI生成视频的内部工作原理,并提供了新的控制方法。


源自 arXiv: 2512.13690