菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-09
📄 Abstract - Speed3R: Sparse Feed-forward 3D Reconstruction Models

While recent feed-forward 3D reconstruction models accelerate 3D reconstruction by jointly inferring dense geometry and camera poses in a single pass, their reliance on dense attention imposes a quadratic complexity, creating a prohibitive computational bottleneck that severely limits inference speed. To resolve this, we introduce Speed3R, an end-to-end trainable model inspired by the core principle of Structure-from-Motion: that a sparse set of keypoints is sufficient for robust pose estimation. Speed3R features a dual-branch attention mechanism where a compression branch creates a coarse contextual prior to guide a selection branch, which performs fine-grained attention only on the most informative image tokens. This strategy mimics the efficiency of traditional keypoint matching, achieving a remarkable 12.4x inference speedup on 1000-view sequences, while introducing a minimal, controlled trade-off in geometric accuracy. Validated on standard benchmarks with both VGGT and $\pi^3$ backbones, our method delivers high-quality reconstructions at a fraction of computational cost, paving the way for efficient large-scale scene modeling.

顶级标签: computer vision model training systems
详细标签: 3d reconstruction sparse attention pose estimation efficiency structure-from-motion 或 搜索:

Speed3R:稀疏前馈三维重建模型 / Speed3R: Sparse Feed-forward 3D Reconstruction Models


1️⃣ 一句话总结

这篇论文提出了一种名为Speed3R的新模型,它通过模仿传统三维重建中只使用少量关键点来估计相机姿态的思路,设计了一种高效的注意力机制,在保证重建质量基本不变的前提下,将处理大量图像时的计算速度提升了12倍以上。

源自 arXiv: 2603.08055