VerseCrafter:具有4D几何控制的动态真实视频世界模型 / VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
1️⃣ 一句话总结
这篇论文提出了一个名为VerseCrafter的新型视频生成模型,它通过一种创新的4D几何控制方法,能够精确且统一地操控视频中的摄像机视角和多个物体的运动轨迹,从而生成高保真且动态一致的视频内容。
Video world models aim to simulate dynamic, real-world environments, yet existing methods struggle to provide unified and precise control over camera and multi-object motion, as videos inherently operate dynamics in the projected 2D image plane. To bridge this gap, we introduce VerseCrafter, a 4D-aware video world model that enables explicit and coherent control over both camera and object dynamics within a unified 4D geometric world state. Our approach is centered on a novel 4D Geometric Control representation, which encodes the world state through a static background point cloud and per-object 3D Gaussian trajectories. This representation captures not only an object's path but also its probabilistic 3D occupancy over time, offering a flexible, category-agnostic alternative to rigid bounding boxes or parametric models. These 4D controls are rendered into conditioning signals for a pretrained video diffusion model, enabling the generation of high-fidelity, view-consistent videos that precisely adhere to the specified dynamics. Unfortunately, another major challenge lies in the scarcity of large-scale training data with explicit 4D annotations. We address this by developing an automatic data engine that extracts the required 4D controls from in-the-wild videos, allowing us to train our model on a massive and diverse dataset.
VerseCrafter:具有4D几何控制的动态真实视频世界模型 / VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
这篇论文提出了一个名为VerseCrafter的新型视频生成模型,它通过一种创新的4D几何控制方法,能够精确且统一地操控视频中的摄像机视角和多个物体的运动轨迹,从而生成高保真且动态一致的视频内容。
源自 arXiv: 2601.05138