菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-08
📄 Abstract - Vision-Language Guided Hyperspectral Object Tracking via Semantics Fusion and Contextual Template Updating

Hyperspectral object tracking (HOT) leverages the rich spectral information provided by hyperspectral videos (HSVs), offering substantial potential for object tracking. However, efficiently extracting and exploiting spectral information from redundant spectral bands remains a fundamental challenge, which severely limits model generalization and tracking performance. Moreover, in dynamic scenes, targets often experience drastic appearance variations due to factors such as occlusion and illumination changes. These variations lead to large deformations between the current frame and the template. Such discrepancies pose major challenges for existing temporal modeling approaches. In this work, we propose VLHTrack, a novel hyperspectral vision-language (VL) joint tracking framework. Specifically, we incorporate language priors to address the fundamental challenge of spectral redundancy by designing a Language-Guided Band Selection Module (LBSM). By leveraging Large Language Model (LLM) descriptions, LBSM establishes a semantic-to-spectral mapping that mitigates redundancy and accentuates discriminative spectral features. A Multi-Modal Vision-Language Fusion Module is then employed to seamlessly integrate visual and linguistic embeddings, harnessing their complementary advantages to learn coherent cross-modal representations. To address target deformation in long-term sequences, we propose a dynamic update template feature strategy implemented via the Dynamic Template Update with Mamba (DTUM) module. By leveraging selective state space modeling, DTUM learns inter-frame dependencies to update template feature, ensuring efficient template feature evolution guided by temporal context. Experiments on HOT2023 and HOT2024 demonstrate that VLHTrack outperforms state-of-the-art (SOTA) methods.

顶级标签: computer vision multi-modal
详细标签: hyperspectral tracking vision-language model band selection template update object tracking 或 搜索:

基于视觉-语言引导的高光谱目标跟踪:语义融合与上下文模板更新 / Vision-Language Guided Hyperspectral Object Tracking via Semantics Fusion and Contextual Template Updating


1️⃣ 一句话总结

本文提出了一种名为VLHTrack的新型高光谱目标跟踪框架,通过引入语言先验来挑选最有效的光谱波段,并利用动态模板更新策略应对目标外观变化,从而在复杂场景中显著提升了跟踪精度和鲁棒性。

源自 arXiv: 2606.09167