Enhancing Cross-View UAV Geolocalization via LVLM-Driven Relational Modeling

📄 Abstract - Enhancing Cross-View UAV Geolocalization via LVLM-Driven Relational Modeling

The primary objective of cross-view UAV geolocalization is to identify the exact spatial coordinates of drone-captured imagery by aligning it with extensive, geo-referenced satellite databases. Current approaches typically extract features independently from each perspective and rely on basic heuristics to compute similarity, thereby failing to explicitly capture the essential interactions between different views. To address this limitation, we introduce a novel, plug-and-play ranking architecture designed to explicitly perform joint relational modeling for improved UAV-to-satellite image matching. By harnessing the capabilities of a Large Vision-Language Model (LVLM), our framework effectively learns the deep visual-semantic correlations linking UAV and satellite imagery. Furthermore, we present a novel relational-aware loss function to optimize the training phase. By employing soft labels, this loss provides fine-grained supervision that avoids overly penalizing near-positive matches, ultimately boosting both the model's discriminative power and training stability. Comprehensive evaluations across various baseline architectures and standard benchmarks reveal that the proposed method substantially boosts the retrieval accuracy of existing models, yielding superior performance even under highly demanding conditions.

通过大型视觉-语言模型驱动的关系建模增强跨视角无人机地理定位 / Enhancing Cross-View UAV Geolocalization via LVLM-Driven Relational Modeling

1️⃣ 一句话总结

这篇论文提出了一种利用大型视觉-语言模型来学习无人机与卫星图像之间深层关联的新方法，通过一个即插即用的关系建模模块和新型损失函数，显著提升了跨视角图像匹配的准确性和稳定性。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要