菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-09
📄 Abstract - Understanding and Optimizing Attention-Based Sparse Matching for Diverse Local Features

We revisit the problem of training attention-based sparse image matching models for various local features. We first identify one critical design choice that has been previously overlooked, which significantly impacts the performance of the LightGlue model. We then investigate the role of detectors and descriptors within the transformer-based matching framework, finding that detectors, rather than descriptors, are often the primary cause for performance difference. Finally, we propose a novel approach to fine-tune existing image matching models using keypoints from a diverse set of detectors, resulting in a universal, detector-agnostic model. When deployed as a zero-shot matcher for novel detectors, the resulting model achieves or exceeds the accuracy of models specifically trained for those features. Our findings offer valuable insights for the deployment of transformer-based matching models and the future design of local features.

顶级标签: computer vision model training multi-modal
详细标签: image matching attention mechanism local features transformer keypoint detection 或 搜索:

理解与优化基于注意力的稀疏匹配方法以适配多样化局部特征 / Understanding and Optimizing Attention-Based Sparse Matching for Diverse Local Features


1️⃣ 一句话总结

这篇论文通过分析发现,在基于注意力机制的图像匹配模型中,特征检测器(而非描述符)是性能差异的关键,并提出了一种利用多种检测器关键点进行微调的新方法,从而创建了一个通用的、不依赖特定检测器的匹配模型,使其在零样本情况下对新检测器的匹配精度达到或超过专门训练的模型。

源自 arXiv: 2602.08430