多视图金字塔变换器:以更粗的视角看更广的范围 / Multi-view Pyramid Transformer: Look Coarser to See Broader
1️⃣ 一句话总结
这篇论文提出了一种名为MVP的新型多视图变换器架构,它通过从局部到全局、从精细到粗略的双重层次设计,能够高效地一次性从数十到数百张图像中重建出大规模且高质量的3D场景。
We propose Multi-view Pyramid Transformer (MVP), a scalable multi-view transformer architecture that directly reconstructs large 3D scenes from tens to hundreds of images in a single forward pass. Drawing on the idea of ``looking broader to see the whole, looking finer to see the details," MVP is built on two core design principles: 1) a local-to-global inter-view hierarchy that gradually broadens the model's perspective from local views to groups and ultimately the full scene, and 2) a fine-to-coarse intra-view hierarchy that starts from detailed spatial representations and progressively aggregates them into compact, information-dense tokens. This dual hierarchy achieves both computational efficiency and representational richness, enabling fast reconstruction of large and complex scenes. We validate MVP on diverse datasets and show that, when coupled with 3D Gaussian Splatting as the underlying 3D representation, it achieves state-of-the-art generalizable reconstruction quality while maintaining high efficiency and scalability across a wide range of view configurations.
多视图金字塔变换器:以更粗的视角看更广的范围 / Multi-view Pyramid Transformer: Look Coarser to See Broader
这篇论文提出了一种名为MVP的新型多视图变换器架构,它通过从局部到全局、从精细到粗略的双重层次设计,能够高效地一次性从数十到数百张图像中重建出大规模且高质量的3D场景。
源自 arXiv: 2512.07806