哎呀,我把凯旋门变小了!——用新数据集破解单目深度估计的“尺度崩塌”难题 / Honey, I Shrunk the Arc de Triomphe!
1️⃣ 一句话总结
本文发现当前AI模型在测量远方物体大小时会出现“尺度崩塌”(比如把远处的凯旋门估测得矮小),原因主要是训练数据不够真实多样,于是研究者从网络照片和立体影像中收集真实数据,创建了MetricScenes数据集,并用新算法修复深度图,成功提升了模型在真实开放场景下对距离和尺寸的测量精度。
Metric scale monocular geometry estimation has seen significant progress through large-scale data aggregation, yet current foundation models suffer from a persistent ''scale-collapse'' phenomenon: distant landmarks and vast landscapes are metrically underestimated. We hypothesize that this performance gap stems from a training data bottleneck, where existing metric-scale datasets are hardware-constrained to homogenous vehicle-captured LiDAR or short-range indoor scans, or consist of synthetic data that lacks the semantic complexity of the physical world. To bridge this gap, we curate a new metrically-grounded, in-the-wild dataset that we call MetricScenes, gathered from a variety of sources including Internet photo collections and stereo imagery. We estimate camera poses and initial depth maps for each scene using off-the-shelf methods, and recover absolute scale from geo-tagged metadata as well as known stereo camera baselines. We also improve the quality of depth maps derived from MetricScenes via a new two-stage Poisson completion method. Fine-tuning MoGe-2 on our dataset significantly mitigates scale-collapse and achieves superior metric accuracy in unconstrained, open-domain scenes while maintaining state-of-the-art performance on standard benchmarks.
哎呀,我把凯旋门变小了!——用新数据集破解单目深度估计的“尺度崩塌”难题 / Honey, I Shrunk the Arc de Triomphe!
本文发现当前AI模型在测量远方物体大小时会出现“尺度崩塌”(比如把远处的凯旋门估测得矮小),原因主要是训练数据不够真实多样,于是研究者从网络照片和立体影像中收集真实数据,创建了MetricScenes数据集,并用新算法修复深度图,成功提升了模型在真实开放场景下对距离和尺寸的测量精度。
源自 arXiv: 2606.02379