← 返回列表

arXiv 提交日期: 2026-02-12

📄 Abstract - Efficient Segment Anything with Depth-Aware Fusion and Limited Training Data

Segment Anything Models (SAM) achieve impressive universal segmentation performance but require massive datasets (e.g., 11M images) and rely solely on RGB inputs. Recent efficient variants reduce computation but still depend on large-scale training. We propose a lightweight RGB-D fusion framework that augments EfficientViT-SAM with monocular depth priors. Depth maps are generated with a pretrained estimator and fused mid-level with RGB features through a dedicated depth encoder. Trained on only 11.2k samples (less than 0.1\% of SA-1B), our method achieves higher accuracy than EfficientViT-SAM, showing that depth cues provide strong geometric priors for segmentation.

顶级标签: computer vision model training multi-modal

基于深度感知融合与有限训练数据的高效通用分割模型 / Efficient Segment Anything with Depth-Aware Fusion and Limited Training Data

1️⃣ 一句话总结

这篇论文提出了一种轻量级的RGB-D融合方法，通过引入单目深度信息作为几何先验，使得通用分割模型在仅使用极少训练数据（原数据量的0.1%）的情况下，就能获得比现有高效模型更准确的分割效果。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2602.11804

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要