菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-09
📄 Abstract - Enhancing Cross-View UAV Geolocalization via LVLM-Driven Relational Modeling

The primary objective of cross-view UAV geolocalization is to identify the exact spatial coordinates of drone-captured imagery by aligning it with extensive, geo-referenced satellite databases. Current approaches typically extract features independently from each perspective and rely on basic heuristics to compute similarity, thereby failing to explicitly capture the essential interactions between different views. To address this limitation, we introduce a novel, plug-and-play ranking architecture designed to explicitly perform joint relational modeling for improved UAV-to-satellite image matching. By harnessing the capabilities of a Large Vision-Language Model (LVLM), our framework effectively learns the deep visual-semantic correlations linking UAV and satellite imagery. Furthermore, we present a novel relational-aware loss function to optimize the training phase. By employing soft labels, this loss provides fine-grained supervision that avoids overly penalizing near-positive matches, ultimately boosting both the model's discriminative power and training stability. Comprehensive evaluations across various baseline architectures and standard benchmarks reveal that the proposed method substantially boosts the retrieval accuracy of existing models, yielding superior performance even under highly demanding conditions.

顶级标签: computer vision multi-modal model training
详细标签: cross-view geolocalization uav-satellite matching vision-language model relational modeling retrieval accuracy 或 搜索:

通过大型视觉-语言模型驱动的关系建模增强跨视角无人机地理定位 / Enhancing Cross-View UAV Geolocalization via LVLM-Driven Relational Modeling


1️⃣ 一句话总结

这篇论文提出了一种利用大型视觉-语言模型来学习无人机与卫星图像之间深层关联的新方法,通过一个即插即用的关系建模模块和新型损失函数,显著提升了跨视角图像匹配的准确性和稳定性。

源自 arXiv: 2603.08063