Rethinking Prototype-based Similarity Learning for Few-Shot Object Detection

📄 Abstract - Rethinking Prototype-based Similarity Learning for Few-Shot Object Detection

Few-shot object detection aims to detect novel object categories from only a few labeled examples, avoiding costly large-scale annotation. Recent prototype-based similarity learning approaches enable training-free adaptation by matching query features with class prototypes. However, they suffer from two fundamental limitations: (i) class confusion arising from inter-class similarity margin collapse, and (ii) insufficient visual cues for precise localization, as similarity scores capture only class-level semantic affinity while providing limited spatial information. To address these issues, we introduce two complementary components. Text-Anchored Semantic Mask (TSMa) leverages class-level text features as semantic anchors to identify semantically aligned channels through channel-wise interaction between visual and text features. By suppressing style-induced spurious responses and emphasizing class-intrinsic signals, TSMa enlarges inter-class similarity margins and mitigates class confusion. We further propose Stage-Aligned Hierarchical Autoregressive Regression (SHARe), which reformulates localization as a hierarchical autoregressive process that progressively refines bounding boxes across multiple stages. SHARe leverages the layer-wise characteristics of ViT representations by aligning feature abstraction levels with regression stages: deeper layers guide early coarse localization, while shallower layers rich in edge and texture cues refine spatial details in later stages. Experiments on COCO demonstrate a new state of the art, outperforming the previous best by +10.1 nAP, with extensive analysis validating each component. The code is available at this https URL.

重新思考基于原型的相似性学习以实现小样本目标检测 / Rethinking Prototype-based Similarity Learning for Few-Shot Object Detection

1️⃣ 一句话总结

本文针对小样本目标检测中基于原型的相似性学习方法，提出了两个创新组件：文本锚定语义掩码（TSMa）通过文本特征引导视觉特征，解决了类别间相似度过高导致的混淆问题；阶段对齐层次化自回归回归（SHARe）则通过分层逐步细化边界框，提升了定位精度，最终在COCO数据集上实现了领先性能。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要