PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery

📄 Abstract - PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery

Panoramic imagery offers a full 360° field of view and is increasingly common in consumer devices. However, it introduces non-pinhole distortions that challenge joint pose estimation and 3D reconstruction. Existing feed-forward models, built for perspective cameras, generalize poorly to this setting. We propose PanoVGGT, a permutation-equivariant Transformer framework that jointly predicts camera poses, depth maps, and 3D point clouds from one or multiple panoramas in a single forward pass. The model incorporates spherical-aware positional embeddings and a panorama-specific three-axis SO(3) rotation augmentation, enabling effective geometric reasoning in the spherical domain. To resolve inherent global-frame ambiguity, we further introduce a stochastic anchoring strategy during training. In addition, we contribute PanoCity, a large-scale outdoor panoramic dataset with dense depth and 6-DoF pose annotations. Extensive experiments on PanoCity and standard benchmarks demonstrate that PanoVGGT achieves competitive accuracy, strong robustness, and improved cross-domain generalization. Code and dataset will be released.

PanoVGGT：基于全景图像的端到端三维重建 / PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery

1️⃣ 一句话总结

这篇论文提出了一个名为PanoVGGT的新型AI模型，它能够直接从一张或多张全景照片中，一步到位地重建出精确的三维场景模型，包括相机位置和深度信息，并专门解决了全景图像特有的几何变形难题。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要