Extend3D:城镇规模的三维场景生成 / Extend3D: Town-Scale 3D Generation
1️⃣ 一句话总结
这篇论文提出了一种名为Extend3D的新方法,它无需额外训练,就能从一张普通图片直接生成大规模、高完整度的三维城镇场景,解决了现有技术在生成广阔场景时效果不佳的问题。
In this paper, we propose Extend3D, a training-free pipeline for 3D scene generation from a single image, built upon an object-centric 3D generative model. To overcome the limitations of fixed-size latent spaces in object-centric models for representing wide scenes, we extend the latent space in the $x$ and $y$ directions. Then, by dividing the extended latent space into overlapping patches, we apply the object-centric 3D generative model to each patch and couple them at each time step. Since patch-wise 3D generation with image conditioning requires strict spatial alignment between image and latent patches, we initialize the scene using a point cloud prior from a monocular depth estimator and iteratively refine occluded regions through SDEdit. We discovered that treating the incompleteness of 3D structure as noise during 3D refinement enables 3D completion via a concept, which we term under-noising. Furthermore, to address the sub-optimality of object-centric models for sub-scene generation, we optimize the extended latent during denoising, ensuring that the denoising trajectories remain consistent with the sub-scene dynamics. To this end, we introduce 3D-aware optimization objectives for improved geometric structure and texture fidelity. We demonstrate that our method yields better results than prior methods, as evidenced by human preference and quantitative experiments.
Extend3D:城镇规模的三维场景生成 / Extend3D: Town-Scale 3D Generation
这篇论文提出了一种名为Extend3D的新方法,它无需额外训练,就能从一张普通图片直接生成大规模、高完整度的三维城镇场景,解决了现有技术在生成广阔场景时效果不佳的问题。
源自 arXiv: 2603.29387