菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-28
📄 Abstract - DualGeo: A Dual-View Framework for Worldwide Image Geo-localization

Worldwide image geo-localization aims to infer the geographic location of an image captured anywhere on Earth, spanning street, city, regional, national, and continental scales. Existing methods rely on visual features that are sensitive to environmental variations (e.g., lighting, season, and weather) and lack effective post-processing to filter outlier candidates, limiting localization accuracy. To address these limitations, we propose DualGeo, a two-stage framework for worldwide image geo-localization. First, it establishes a geo-representational foundation by fusing image and semantic segmentation features via bidirectional cross-attention. The fused features are then aligned with GPS coordinates through dual-view contrastive learning to build a global retrieval database. Second, it performs geo-cognitive refinement by re-ranking retrieved candidates using geographic clustering. It then feeds them into large multimodal models (LMMs) for final coordinate prediction. Experiments on IM2GPS, IM2GPS3k, and YFCC4k show that DualGeo outperforms state-of-the-art methods, improving street-level (<1 km) and city-level (<25 km) localization accuracy by 3.6%-16.58% and 1.29%-8.77%, respectively. Our code and datasets are available : this https URL.

顶级标签: computer vision multi-modal
详细标签: geo-localization cross-attention contrastive learning gps alignment large multimodal models 或 搜索:

DualGeo:用于全球图像地理定位的双视角框架 / DualGeo: A Dual-View Framework for Worldwide Image Geo-localization


1️⃣ 一句话总结

本文提出DualGeo框架,通过融合图像与语义分割特征的对比学习建立全球检索库,并利用地理聚类和大语言模型对候选地点重新排序,从而在多种尺度上显著提升全球图像地理定位的准确性。

源自 arXiv: 2604.25533