Track4World:前馈式世界中心坐标系下所有像素的密集三维跟踪 / Track4World: Feedforward World-centric Dense 3D Tracking of All Pixels
1️⃣ 一句话总结
这篇论文提出了一个名为Track4World的高效前馈模型,它能够从单目视频中快速、准确地追踪每一个像素在三维空间中的运动轨迹,为理解视频的动态三维结构提供了强大的新工具。
Estimating the 3D trajectory of every pixel from a monocular video is crucial and promising for a comprehensive understanding of the 3D dynamics of videos. Recent monocular 3D tracking works demonstrate impressive performance, but are limited to either tracking sparse points on the first frame or a slow optimization-based framework for dense tracking. In this paper, we propose a feedforward model, called Track4World, enabling an efficient holistic 3D tracking of every pixel in the world-centric coordinate system. Built on the global 3D scene representation encoded by a VGGT-style ViT, Track4World applies a novel 3D correlation scheme to simultaneously estimate the pixel-wise 2D and 3D dense flow between arbitrary frame pairs. The estimated scene flow, along with the reconstructed 3D geometry, enables subsequent efficient 3D tracking of every pixel of this video. Extensive experiments on multiple benchmarks demonstrate that our approach consistently outperforms existing methods in 2D/3D flow estimation and 3D tracking, highlighting its robustness and scalability for real-world 4D reconstruction tasks.
Track4World:前馈式世界中心坐标系下所有像素的密集三维跟踪 / Track4World: Feedforward World-centric Dense 3D Tracking of All Pixels
这篇论文提出了一个名为Track4World的高效前馈模型,它能够从单目视频中快速、准确地追踪每一个像素在三维空间中的运动轨迹,为理解视频的动态三维结构提供了强大的新工具。
源自 arXiv: 2603.02573