菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-16
📄 Abstract - StreamCacheVGGT: Streaming Visual Geometry Transformers with Robust Scoring and Hybrid Cache Compression

Reconstructing dense 3D geometry from continuous video streams requires stable inference under a constant memory budget. Existing $O(1)$ frameworks primarily rely on a ``pure eviction'' paradigm, which suffers from significant information destruction due to binary token deletion and evaluation noise from localized, single-layer scoring. To address these bottlenecks, we propose StreamCacheVGGT, a training-free framework that reimagines cache management through two synergistic modules: Cross-Layer Consistency-Enhanced Scoring (CLCES) and Hybrid Cache Compression (HCC). CLCES mitigates activation noise by tracking token importance trajectories across the Transformer hierarchy, employing order-statistical analysis to identify sustained geometric salience. Leveraging these robust scores, HCC transcends simple eviction by introducing a three-tier triage strategy that merges moderately important tokens into retained anchors via nearest-neighbor assignment on the key-vector manifold. This approach preserves essential geometric context that would otherwise be lost. Extensive evaluations on five benchmarks (7-Scenes, NRGBD, ETH3D, Bonn, and KITTI) demonstrate that StreamCacheVGGT sets a new state-of-the-art, delivering superior reconstruction accuracy and long-term stability while strictly adhering to constant-cost constraints.

顶级标签: computer vision systems model evaluation
详细标签: 3d reconstruction video streaming transformer cache memory efficiency geometry processing 或 搜索:

StreamCacheVGGT:具有鲁棒评分和混合缓存压缩的流式视觉几何变换器 / StreamCacheVGGT: Streaming Visual Geometry Transformers with Robust Scoring and Hybrid Cache Compression


1️⃣ 一句话总结

这篇论文提出了一种名为StreamCacheVGGT的新方法,它通过跨层一致性评分和混合缓存压缩技术,在固定内存限制下,显著提升了从连续视频流重建3D几何结构的准确性和稳定性。

源自 arXiv: 2604.15237