菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-16
📄 Abstract - SS4D: Native 4D Generative Model via Structured Spacetime Latents

We present SS4D, a native 4D generative model that synthesizes dynamic 3D objects directly from monocular video. Unlike prior approaches that construct 4D representations by optimizing over 3D or video generative models, we train a generator directly on 4D data, achieving high fidelity, temporal coherence, and structural consistency. At the core of our method is a compressed set of structured spacetime latents. Specifically, (1) To address the scarcity of 4D training data, we build on a pre-trained single-image-to-3D model, preserving strong spatial consistency. (2) Temporal consistency is enforced by introducing dedicated temporal layers that reason across frames. (3) To support efficient training and inference over long video sequences, we compress the latent sequence along the temporal axis using factorized 4D convolutions and temporal downsampling blocks. In addition, we employ a carefully designed training strategy to enhance robustness against occlusion

顶级标签: computer vision video generation aigc
详细标签: 4d generation dynamic 3d objects spacetime latents temporal coherence video-to-4d 或 搜索:

SS4D:基于结构化时空潜在表示的本地4D生成模型 / SS4D: Native 4D Generative Model via Structured Spacetime Latents


1️⃣ 一句话总结

这篇论文提出了一个名为SS4D的新模型,它能够直接从单目视频中生成高质量、时间上连贯且结构一致的动态3D物体,其核心创新在于使用了一套压缩的、结构化的时空潜在表示来高效地处理4D数据。


源自 arXiv: 2512.14284