RACANet:面向RGB-T人群计数的可靠性感知群体锚点网络 / RACANet: Reliability-Aware Crowd Anchor Network for RGB-T Crowd Counting
1️⃣ 一句话总结
本文提出了一种名为RACANet的两阶段融合框架,通过显式学习跨模态语义对齐和基于区域可靠性的局部锚点融合机制,有效提升了RGB-T人群计数在复杂场景下的准确性与可解释性。
RGB-Thermal (T) crowd counting aims to integrate visible-spectrum and thermal infrared information to improve the robustness of crowd density estimation in complex scenes. Although existing studies generally improve counting accuracy through cross-modal feature fusion, most current methods rely on implicit cross-modal fusion strategies and lack explicit modeling of local spatial discrepancies as well as fine-grained characterization of modality reliability at the positional level, thereby limiting the accuracy and interpretability of the fusion process. To address these issues, this paper proposes a two-stage fusion framework, RACANet, a Reliability-Aware Crowd Anchor Network for RGB-T crowd counting. First, we introduce a lightweight cross-modal alignment pretraining stage, which explicitly learns cross-modal semantic correspondences through crowd-prior supervision and local bidirectional soft matching. Then, based on the priors learned during pretraining, a Local Anchor Fusion Module (LAFM) is introduced in the formal training stage. This module generates local semantic anchors by aggregating features from highly reliable regions and further enables adaptive pixel-level feature redistribution with a local attention mechanism. In addition, we propose a discrepancy-aware consistency constraint to dynamically coordinate the reliability of regions where modal representations are consistent. Experiments conducted on two widely used benchmark datasets, RGBT-CC and Drone-RGBT, demonstrate that RACANet outperforms existing methods. The anonymous code is available at this https URL.
RACANet:面向RGB-T人群计数的可靠性感知群体锚点网络 / RACANet: Reliability-Aware Crowd Anchor Network for RGB-T Crowd Counting
本文提出了一种名为RACANet的两阶段融合框架,通过显式学习跨模态语义对齐和基于区域可靠性的局部锚点融合机制,有效提升了RGB-T人群计数在复杂场景下的准确性与可解释性。
源自 arXiv: 2604.24543