菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-14
📄 Abstract - Learning with Semantic Priors: Stabilizing Point-Supervised Infrared Small Target Detection via Hierarchical Knowledge Distillation

Single-frame Infrared Small Target Detection (ISTD) aims to localize weak targets under heavy background clutter, yet dense pixel-wise annotations are expensive. Point supervision with online label evolution reduces annotation cost; however, lightweight CNN detectors often lack sufficient semantics, leading to noisy pseudo-masks and unstable optimization. To address this, we propose a hierarchical VFM-driven knowledge distillation framework that uses a frozen Vision Foundation Model (VFM) during training. We formulate point-supervised learning as a bilevel optimization process: the inner loop adapts a VFM-embedded teacher on reweighted training samples, while the outer loop transfers validation-guided knowledge to a lightweight student to mitigate pseudo-label noise and training-set bias. We further introduce Semantic-Conditioned Affine Modulation (SCAM) to inject VFM semantics into CNN features at multiple layers. In addition, a dynamic collaborative learning strategy with cluster-level sample reweighting enhances robustness to imperfect pseudo-masks. Experiments on diverse challenging cases across multiple ISTD backbones demonstrate consistent improvements in detection accuracy and training stability. Our code is available at this https URL.

顶级标签: computer vision model training
详细标签: infrared small target detection point supervision knowledge distillation vision foundation model semantic prior 或 搜索:

基于语义先验的学习:通过层级知识蒸馏稳定点监督的红外小目标检测 / Learning with Semantic Priors: Stabilizing Point-Supervised Infrared Small Target Detection via Hierarchical Knowledge Distillation


1️⃣ 一句话总结

本文提出一种利用预训练视觉基础模型(VFM)作为语义先验的知识蒸馏方法,通过双层优化和语义条件调制,帮助轻量级CNN网络在仅有点标注的情况下稳定地学习红外小目标检测,显著提升了伪标签的质量和训练稳定性。

源自 arXiv: 2605.14346