基于深度感知融合与有限训练数据的高效通用分割模型 / Efficient Segment Anything with Depth-Aware Fusion and Limited Training Data
1️⃣ 一句话总结
这篇论文提出了一种轻量级的RGB-D融合方法,通过引入单目深度信息作为几何先验,使得通用分割模型在仅使用极少训练数据(原数据量的0.1%)的情况下,就能获得比现有高效模型更准确的分割效果。
Segment Anything Models (SAM) achieve impressive universal segmentation performance but require massive datasets (e.g., 11M images) and rely solely on RGB inputs. Recent efficient variants reduce computation but still depend on large-scale training. We propose a lightweight RGB-D fusion framework that augments EfficientViT-SAM with monocular depth priors. Depth maps are generated with a pretrained estimator and fused mid-level with RGB features through a dedicated depth encoder. Trained on only 11.2k samples (less than 0.1\% of SA-1B), our method achieves higher accuracy than EfficientViT-SAM, showing that depth cues provide strong geometric priors for segmentation.
基于深度感知融合与有限训练数据的高效通用分割模型 / Efficient Segment Anything with Depth-Aware Fusion and Limited Training Data
这篇论文提出了一种轻量级的RGB-D融合方法,通过引入单目深度信息作为几何先验,使得通用分割模型在仅使用极少训练数据(原数据量的0.1%)的情况下,就能获得比现有高效模型更准确的分割效果。
源自 arXiv: 2602.11804