Honey, I Shrunk the Arc de Triomphe!

📄 Abstract - Honey, I Shrunk the Arc de Triomphe!

Metric scale monocular geometry estimation has seen significant progress through large-scale data aggregation, yet current foundation models suffer from a persistent ''scale-collapse'' phenomenon: distant landmarks and vast landscapes are metrically underestimated. We hypothesize that this performance gap stems from a training data bottleneck, where existing metric-scale datasets are hardware-constrained to homogenous vehicle-captured LiDAR or short-range indoor scans, or consist of synthetic data that lacks the semantic complexity of the physical world. To bridge this gap, we curate a new metrically-grounded, in-the-wild dataset that we call MetricScenes, gathered from a variety of sources including Internet photo collections and stereo imagery. We estimate camera poses and initial depth maps for each scene using off-the-shelf methods, and recover absolute scale from geo-tagged metadata as well as known stereo camera baselines. We also improve the quality of depth maps derived from MetricScenes via a new two-stage Poisson completion method. Fine-tuning MoGe-2 on our dataset significantly mitigates scale-collapse and achieves superior metric accuracy in unconstrained, open-domain scenes while maintaining state-of-the-art performance on standard benchmarks.

哎呀，我把凯旋门变小了！——用新数据集破解单目深度估计的“尺度崩塌”难题 / Honey, I Shrunk the Arc de Triomphe!

1️⃣ 一句话总结

本文发现当前AI模型在测量远方物体大小时会出现“尺度崩塌”（比如把远处的凯旋门估测得矮小），原因主要是训练数据不够真实多样，于是研究者从网络照片和立体影像中收集真实数据，创建了MetricScenes数据集，并用新算法修复深度图，成功提升了模型在真实开放场景下对距离和尺寸的测量精度。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要