菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-23
📄 Abstract - Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning

Current feed-forward 3D/4D reconstruction systems rely on dense geometry and pose supervision -- expensive to obtain at scale and particularly scarce for dynamic real-world scenes. We present Flow3r, a framework that augments visual geometry learning with dense 2D correspondences (`flow') as supervision, enabling scalable training from unlabeled monocular videos. Our key insight is that the flow prediction module should be factored: predicting flow between two images using geometry latents from one and pose latents from the other. This factorization directly guides the learning of both scene geometry and camera motion, and naturally extends to dynamic scenes. In controlled experiments, we show that factored flow prediction outperforms alternative designs and that performance scales consistently with unlabeled data. Integrating factored flow into existing visual geometry architectures and training with ${\sim}800$K unlabeled videos, Flow3r achieves state-of-the-art results across eight benchmarks spanning static and dynamic scenes, with its largest gains on in-the-wild dynamic videos where labeled data is most scarce.

顶级标签: computer vision model training multi-modal
详细标签: 3d reconstruction optical flow self-supervised learning video understanding scene geometry 或 搜索:

Flow3r:用于可扩展视觉几何学习的分解式光流预测 / Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning


1️⃣ 一句话总结

这篇论文提出了一种名为Flow3r的新方法,它通过分解式的光流预测,利用大量无标签的单目视频来学习三维场景几何和相机运动,从而在静态和动态场景重建任务上都取得了领先的性能,尤其是在真实动态视频这种标注数据稀缺的场景中效果提升最明显。

源自 arXiv: 2602.20157