Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering

📄 Abstract - Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering

Modern video generative models based on diffusion models can produce very realistic clips, but they are computationally inefficient, often requiring minutes of GPU time for just a few seconds of video. This inefficiency poses a critical barrier to deploying generative video in applications that require real-time interactions, such as embodied AI and VR/AR. This paper explores a new strategy for camera-conditioned video generation of static scenes: using diffusion-based generative models to generate a sparse set of keyframes, and then synthesizing the full video through 3D reconstruction and rendering. By lifting keyframes into a 3D representation and rendering intermediate views, our approach amortizes the generation cost across hundreds of frames while enforcing geometric consistency. We further introduce a model that predicts the optimal number of keyframes for a given camera trajectory, allowing the system to adaptively allocate computation. Our final method, SRENDER, uses very sparse keyframes for simple trajectories and denser ones for complex camera motion. This results in video generation that is more than 40 times faster than the diffusion-based baseline in generating 20 seconds of video, while maintaining high visual fidelity and temporal stability, offering a practical path toward efficient and controllable video synthesis.

通过稀疏扩散与3D渲染实现静态场景的高效相机控制视频生成 / Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering

1️⃣ 一句话总结

这篇论文提出了一种名为SRENDER的新方法，它通过先用扩散模型生成少量关键帧，再利用3D重建和渲染技术合成完整视频，从而在保证高质量画面的同时，将视频生成速度提升超过40倍，解决了现有模型计算效率低、难以实时交互的问题。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要