VL-SAM-v3: Memory-Guided Visual Priors for Open-World Object Detection

📄 Abstract - VL-SAM-v3: Memory-Guided Visual Priors for Open-World Object Detection

Open-world object detection aims to localize and recognize objects beyond a fixed closed-set label space. It is commonly divided into two categories, i.e., open-vocabulary detection, which assumes a predefined category list at test time, and open-ended detection, which requires generating candidate categories during the inference. Existing methods rely primarily on coarse textual semantics and parametric knowledge, which often provide insufficient visual evidence for fine-grained appearance variation, rare categories, and cluttered scenes. In this paper, we propose VL-SAM-v3, a unified framework that augments open-world detection with retrieval-grounded external visual memory. Specifically, once candidate categories are available, VL-SAM-v3 retrieves relevant visual prototypes from a non-parametric memory bank and transforms them into two complementary visual priors, i.e., sparse priors for instance-level spatial anchoring and dense priors for class-aware local context. These priors are integrated with the original detection prompts via Memory-Guided Prompt Refinement, enabling a shared retrieval-and-refinement mechanism that supports open-vocabulary and open-ended this http URL zero-shot experiments on LVIS show that VL-SAM-v3 consistently improves detection performance under both open-vocabulary and open-ended inference, with particularly strong gains on rare this http URL, experiments with a stronger open-vocabulary detector (i.e., SAM3) validate the generality of the proposed retrieval-and-refinement mechanism.

VL-SAM-v3：基于记忆引导的视觉先验实现开放世界目标检测 / VL-SAM-v3: Memory-Guided Visual Priors for Open-World Object Detection

1️⃣ 一句话总结

本文提出了VL-SAM-v3框架，通过从外部记忆中检索视觉范例生成稀疏和稠密两类视觉先验，并与原检测提示融合，从而让模型在开放世界环境下（包括已知类别列表和未知类别）更好地识别罕见、纹理模糊或背景杂乱的目标，在LVIS数据集上取得了显著提升。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要