菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-05
📄 Abstract - InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams

The grand vision of enabling persistent, large-scale 3D visual geometry understanding is shackled by the irreconcilable demands of scalability and long-term stability. While offline models like VGGT achieve inspiring geometry capability, their batch-based nature renders them irrelevant for live systems. Streaming architectures, though the intended solution for live operation, have proven inadequate. Existing methods either fail to support truly infinite-horizon inputs or suffer from catastrophic drift over long sequences. We shatter this long-standing dilemma with InfiniteVGGT, a causal visual geometry transformer that operationalizes the concept of a rolling memory through a bounded yet adaptive and perpetually expressive KV cache. Capitalizing on this, we devise a training-free, attention-agnostic pruning strategy that intelligently discards obsolete information, effectively ``rolling'' the memory forward with each new frame. Fully compatible with FlashAttention, InfiniteVGGT finally alleviates the compromise, enabling infinite-horizon streaming while outperforming existing streaming methods in long-term stability. The ultimate test for such a system is its performance over a truly infinite horizon, a capability that has been impossible to rigorously validate due to the lack of extremely long-term, continuous benchmarks. To address this critical gap, we introduce the Long3D benchmark, which, for the first time, enables a rigorous evaluation of continuous 3D geometry estimation on sequences about 10,000 frames. This provides the definitive evaluation platform for future research in long-term 3D geometry understanding. Code is available at: this https URL

顶级标签: computer vision systems model evaluation
详细标签: 3d geometry streaming models long-term stability transformer benchmark 或 搜索:

InfiniteVGGT:面向无限数据流的视觉几何基础Transformer模型 / InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams


1️⃣ 一句话总结

这篇论文提出了一个名为InfiniteVGGT的新模型,它通过一种创新的‘滚动记忆’机制,首次解决了在无限长的实时视频流中进行稳定、大规模3D几何理解的关键难题,并为此创建了一个超长序列的评测基准。

源自 arXiv: 2601.02281