WorldWarp:利用异步视频扩散传播三维几何 / WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion
1️⃣ 一句话总结
这篇论文提出了一个名为WorldWarp的新框架,它通过结合一个动态更新的三维几何模型和一个能智能“填补与修正”的二维生成模型,解决了在生成长视频时保持三维几何一致性的难题,从而能生成视角复杂、画面连贯且逼真的视频。
Generating long-range, geometrically consistent video presents a fundamental dilemma: while consistency demands strict adherence to 3D geometry in pixel space, state-of-the-art generative models operate most effectively in a camera-conditioned latent space. This disconnect causes current methods to struggle with occluded areas and complex camera trajectories. To bridge this gap, we propose WorldWarp, a framework that couples a 3D structural anchor with a 2D generative refiner. To establish geometric grounding, WorldWarp maintains an online 3D geometric cache built via Gaussian Splatting (3DGS). By explicitly warping historical content into novel views, this cache acts as a structural scaffold, ensuring each new frame respects prior geometry. However, static warping inevitably leaves holes and artifacts due to occlusions. We address this using a Spatio-Temporal Diffusion (ST-Diff) model designed for a "fill-and-revise" objective. Our key innovation is a spatio-temporal varying noise schedule: blank regions receive full noise to trigger generation, while warped regions receive partial noise to enable refinement. By dynamically updating the 3D cache at every step, WorldWarp maintains consistency across video chunks. Consequently, it achieves state-of-the-art fidelity by ensuring that 3D logic guides structure while diffusion logic perfects texture. Project page: \href{this https URL}{this https URL}.
WorldWarp:利用异步视频扩散传播三维几何 / WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion
这篇论文提出了一个名为WorldWarp的新框架,它通过结合一个动态更新的三维几何模型和一个能智能“填补与修正”的二维生成模型,解决了在生成长视频时保持三维几何一致性的难题,从而能生成视角复杂、画面连贯且逼真的视频。
源自 arXiv: 2512.19678