用于自监督单目深度估计的自适应深度转换尺度卷积 / Adaptive Depth-converted-Scale Convolution for Self-supervised Monocular Depth Estimation
1️⃣ 一句话总结
这篇论文提出了一种名为深度转换尺度卷积(DcSConv)的新方法,通过让卷积滤波器根据物体深度自动调整其感受野大小,有效解决了单目视频中物体因远近变化导致尺寸模糊、从而影响深度估计准确性的问题,并能作为即插即用模块提升现有模型的性能。
Self-supervised monocular depth estimation (MDE) has received increasing interests in the last few years. The objects in the scene, including the object size and relationship among different objects, are the main clues to extract the scene structure. However, previous works lack the explicit handling of the changing sizes of the object due to the change of its depth. Especially in a monocular video, the size of the same object is continuously changed, resulting in size and depth ambiguity. To address this problem, we propose a Depth-converted-Scale Convolution (DcSConv) enhanced monocular depth estimation framework, by incorporating the prior relationship between the object depth and object scale to extract features from appropriate scales of the convolution receptive field. The proposed DcSConv focuses on the adaptive scale of the convolution filter instead of the local deformation of its shape. It establishes that the scale of the convolution filter matters no less (or even more in the evaluated task) than its local deformation. Moreover, a Depth-converted-Scale aware Fusion (DcS-F) is developed to adaptively fuse the DcSConv features and the conventional convolution features. Our DcSConv enhanced monocular depth estimation framework can be applied on top of existing CNN based methods as a plug-and-play module to enhance the conventional convolution block. Extensive experiments with different baselines have been conducted on the KITTI benchmark and our method achieves the best results with an improvement up to 11.6% in terms of SqRel reduction. Ablation study also validates the effectiveness of each proposed module.
用于自监督单目深度估计的自适应深度转换尺度卷积 / Adaptive Depth-converted-Scale Convolution for Self-supervised Monocular Depth Estimation
这篇论文提出了一种名为深度转换尺度卷积(DcSConv)的新方法,通过让卷积滤波器根据物体深度自动调整其感受野大小,有效解决了单目视频中物体因远近变化导致尺寸模糊、从而影响深度估计准确性的问题,并能作为即插即用模块提升现有模型的性能。
源自 arXiv: 2604.07665