Geometric Context Transformer for Streaming 3D Reconstruction

📄 Abstract - Geometric Context Transformer for Streaming 3D Reconstruction

Streaming 3D reconstruction aims to recover 3D information, such as camera poses and point clouds, from a video stream, which necessitates geometric accuracy, temporal consistency, and computational efficiency. Motivated by the principles of Simultaneous Localization and Mapping (SLAM), we introduce LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture. A defining aspect of LingBot-Map lies in its carefully designed attention mechanism, which integrates an anchor context, a pose-reference window, and a trajectory memory to address coordinate grounding, dense geometric cues, and long-range drift correction, respectively. This design keeps the streaming state compact while retaining rich geometric context, enabling stable efficient inference at around 20 FPS on 518 x 378 resolution inputs over long sequences exceeding 10,000 frames. Extensive evaluations across a variety of benchmarks demonstrate that our approach achieves superior performance compared to both existing streaming and iterative optimization-based approaches.

用于流式三维重建的几何上下文变换器 / Geometric Context Transformer for Streaming 3D Reconstruction

1️⃣ 一句话总结

这篇论文提出了一个名为LingBot-Map的新型三维重建基础模型，它通过一种创新的几何上下文变换器架构，能够从视频流中实时、稳定且准确地重建三维场景，在保持高性能的同时实现约每秒20帧的处理速度。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要