AnyDepth:让深度估计变得简单 / AnyDepth: Depth Estimation Made Easy
1️⃣ 一句话总结
这篇论文提出了一个名为AnyDepth的轻量级框架,通过使用高质量的视觉编码器、设计更简单的解码器以及优化训练数据质量,在无需针对特定场景进行额外训练的情况下,实现了更高效且更准确的单张图像深度估计。
Monocular depth estimation aims to recover the depth information of 3D scenes from 2D images. Recent work has made significant progress, but its reliance on large-scale datasets and complex decoders has limited its efficiency and generalization ability. In this paper, we propose a lightweight and data-centric framework for zero-shot monocular depth estimation. We first adopt DINOv3 as the visual encoder to obtain high-quality dense features. Secondly, to address the inherent drawbacks of the complex structure of the DPT, we design the Simple Depth Transformer (SDT), a compact transformer-based decoder. Compared to the DPT, it uses a single-path feature fusion and upsampling process to reduce the computational overhead of cross-scale feature fusion, achieving higher accuracy while reducing the number of parameters by approximately 85%-89%. Furthermore, we propose a quality-based filtering strategy to filter out harmful samples, thereby reducing dataset size while improving overall training quality. Extensive experiments on five benchmarks demonstrate that our framework surpasses the DPT in accuracy. This work highlights the importance of balancing model design and data quality for achieving efficient and generalizable zero-shot depth estimation. Code: this https URL. Website: this https URL.
AnyDepth:让深度估计变得简单 / AnyDepth: Depth Estimation Made Easy
这篇论文提出了一个名为AnyDepth的轻量级框架,通过使用高质量的视觉编码器、设计更简单的解码器以及优化训练数据质量,在无需针对特定场景进行额外训练的情况下,实现了更高效且更准确的单张图像深度估计。
源自 arXiv: 2601.02760