BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation

📄 Abstract - BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation

Generating minute-long videos is a critical step toward developing world models, providing a foundation for realistic extended scenes and advanced AI simulators. The emerging semi-autoregressive (block diffusion) paradigm integrates the strengths of diffusion and autoregressive models, enabling arbitrary-length video generation and improving inference efficiency through KV caching and parallel sampling. However, it yet faces two enduring challenges: (i) KV-cache-induced long-horizon error accumulation, and (ii) the lack of fine-grained long-video benchmarks and coherence-aware metrics. To overcome these limitations, we propose BlockVid, a novel block diffusion framework equipped with semantic-aware sparse KV cache, an effective training strategy called Block Forcing, and dedicated chunk-wise noise scheduling and shuffling to reduce error propagation and enhance temporal consistency. We further introduce LV-Bench, a fine-grained benchmark for minute-long videos, complete with new metrics evaluating long-range coherence. Extensive experiments on VBench and LV-Bench demonstrate that BlockVid consistently outperforms existing methods in generating high-quality, coherent minute-long videos. In particular, it achieves a 22.2% improvement on VDE Subject and a 19.4% improvement on VDE Clarity in LV-Bench over the state of the art approaches. Project website: this https URL. Inferix (Code): this https URL.

BlockVid：用于高质量、一致性分钟级视频生成的块扩散模型 / BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation

1️⃣ 一句话总结

这篇论文提出了一个名为BlockVid的新方法，通过改进块扩散技术、引入语义感知缓存和新的训练策略，有效解决了生成长视频时常见的错误累积和连贯性问题，并在新建立的评测基准上显著超越了现有方法，能够生成更高质量、更连贯的分钟级长视频。

← 返回列表

菜单

🤖 AI 深度阅读

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

🤖 AI 深度阅读

1️⃣ 一句话总结

获取最新论文摘要