菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-22
📄 Abstract - Rethinking Prototype-based Similarity Learning for Few-Shot Object Detection

Few-shot object detection aims to detect novel object categories from only a few labeled examples, avoiding costly large-scale annotation. Recent prototype-based similarity learning approaches enable training-free adaptation by matching query features with class prototypes. However, they suffer from two fundamental limitations: (i) class confusion arising from inter-class similarity margin collapse, and (ii) insufficient visual cues for precise localization, as similarity scores capture only class-level semantic affinity while providing limited spatial information. To address these issues, we introduce two complementary components. Text-Anchored Semantic Mask (TSMa) leverages class-level text features as semantic anchors to identify semantically aligned channels through channel-wise interaction between visual and text features. By suppressing style-induced spurious responses and emphasizing class-intrinsic signals, TSMa enlarges inter-class similarity margins and mitigates class confusion. We further propose Stage-Aligned Hierarchical Autoregressive Regression (SHARe), which reformulates localization as a hierarchical autoregressive process that progressively refines bounding boxes across multiple stages. SHARe leverages the layer-wise characteristics of ViT representations by aligning feature abstraction levels with regression stages: deeper layers guide early coarse localization, while shallower layers rich in edge and texture cues refine spatial details in later stages. Experiments on COCO demonstrate a new state of the art, outperforming the previous best by +10.1 nAP, with extensive analysis validating each component. The code is available at this https URL.

顶级标签: computer vision machine learning
详细标签: few-shot object detection prototype learning similarity learning vision transformer semantic mask 或 搜索:

重新思考基于原型的相似性学习以实现小样本目标检测 / Rethinking Prototype-based Similarity Learning for Few-Shot Object Detection


1️⃣ 一句话总结

本文针对小样本目标检测中基于原型的相似性学习方法,提出了两个创新组件:文本锚定语义掩码(TSMa)通过文本特征引导视觉特征,解决了类别间相似度过高导致的混淆问题;阶段对齐层次化自回归回归(SHARe)则通过分层逐步细化边界框,提升了定位精度,最终在COCO数据集上实现了领先性能。

源自 arXiv: 2606.23069