频率引导的RGB-热红外语义分割融合方法 / Frequency-Guided Fusion For RGB-Thermal Semantic Segmentation
1️⃣ 一句话总结
本文提出了一种针对RGB和热红外图像的多模态语义分割方法,通过分别处理早期特征中的低频与高频信息,以及晚期特征中的语义对应关系,实现了在低光照等复杂环境下更精准的场景理解,同时在计算量和参数上比现有方法更高效。
Semantic segmentation in complex environments such as urban driving scenes remains challenging under adverse lighting conditions, where RGB images alone provide insufficient information. RGB-Thermal fusion leverages the complementary strengths of visible and infrared imagery to improve scene understanding; however, effectively integrating these heterogeneous modalities at varying levels of feature abstraction remains an open problem. In this paper, we propose a multi-modal fusion architecture built upon dual ConvNeXt V2 backbones that employs stage-wise, modality-adaptive fusion strategies. For early-stage features, we introduce a Frequency-Based Fusion Module that decomposes infrared features into low- and high-frequency components via Gaussian filtering, applies dual-branch spatial attention to selectively emphasize thermal patterns and fine-grained boundaries, and integrates them with RGB features through a confidence-gated residual mechanism. For late-stage features, we design a semantic fusion module with cross-modal attention and multi-scale depthwise convolutions to capture semantic correspondences across modalities. The fused features are decoded via a PANet-style bidirectional decoder with deep supervision. Experiments on MFNet and PST900 demonstrate that our lightest variant achieves 61.73\% and 86.24\% mIoU, respectively, with only 35.43M parameters, outperforming recent methods while using substantially fewer parameters and lower computational cost. Code is available at this https URL
频率引导的RGB-热红外语义分割融合方法 / Frequency-Guided Fusion For RGB-Thermal Semantic Segmentation
本文提出了一种针对RGB和热红外图像的多模态语义分割方法,通过分别处理早期特征中的低频与高频信息,以及晚期特征中的语义对应关系,实现了在低光照等复杂环境下更精准的场景理解,同时在计算量和参数上比现有方法更高效。
源自 arXiv: 2605.26273