菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-04
📄 Abstract - SpectraDINO: Bridging the Spectral Gap in Vision Foundation Models via Lightweight Adapters

Vision Foundation Models (VFMs) pretrained on large-scale RGB data have demonstrated remarkable representation quality, yet their applicability to multispectral imaging spanning Near-Infrared (NIR), Short-Wave Infrared (SWIR), and Long-Wave Infrared (LWIR) remains largely unexplored. These spectral modalities offer complementary sensing capabilities critical for robust perception in adverse conditions, but present a fundamental domain gap relative to RGB-centric pretrained models. We present SpectraDINO, a multispectral VFM that bridges this spectral gap by extending DINOv2 ViT backbones to beyond-visible modalities through lightweight, per-modality bottleneck adapters, while preserving the rich representations of the frozen RGB backbone. We introduce a multi-stage teacher-student training protocol in which a frozen DINOv2 teacher guides a spectral student via cosine distillation, symmetric contrastive loss, patch-level alignment, and a novel neighborhood-structure-preservation loss. This staged curriculum enables strong cross-modal alignment without catastrophic forgetting of RGB priors. We evaluate SpectraDINO on multispectral object detection and semantic segmentation across challenging NIR, SWIR, and LWIR benchmarks using widely adopted fusion strategies. SpectraDINO achieves state-of-the-art performance across most benchmarks, validating its effectiveness as a general-purpose backbone for spectral generalization. The code and weights for model variants are available at this https URL.

顶级标签: computer vision model training
详细标签: multispectral imaging vision foundation models domain adaptation knowledge distillation object detection 或 搜索:

SpectraDINO:通过轻量适配器弥合视觉基础模型中的光谱鸿沟 / SpectraDINO: Bridging the Spectral Gap in Vision Foundation Models via Lightweight Adapters


1️⃣ 一句话总结

本文提出SpectraDINO,通过为每个光谱波段(如近红外、短波红外、长波红外)添加轻量级适配器模块,并设计多阶段师生训练策略,成功将基于RGB图像预训练的视觉大模型DINOv2扩展到多光谱领域,在多种恶劣环境下的目标检测和语义分割任务中取得了领先性能。

源自 arXiv: 2605.02258