MARCO:在语义对应的未知空间中导航 / MARCO: Navigating the Unseen Space of Semantic Correspondence
1️⃣ 一句话总结
本文提出了一种名为MARCO的轻量级模型,通过结合从粗到细的空间定位和自蒸馏训练框架,仅用少量标注关键点就能生成密集且语义一致的图像对应关系,在多个基准测试中显著超越现有方法,同时模型大小和推理速度分别只有扩散方法的1/3和10倍,尤其擅长处理训练中未见过的关键点和物体类别。
Recent advances in semantic correspondence rely on dual-encoder architectures, combining DINOv2 with diffusion backbones. While accurate, these billion-parameter models generalize poorly beyond training keypoints, revealing a gap between benchmark performance and real-world usability, where queried points rarely match those seen during training. Building upon DINOv2, we introduce MARCO, a unified model for generalizable correspondence driven by a novel training framework that enhances both fine-grained localization and semantic generalization. By coupling a coarse-to-fine objective that refines spatial precision with a self-distillation framework, which expands sparse supervision beyond annotated regions, our approach transforms a handful of keypoints into dense, semantically coherent correspondences. MARCO sets a new state of the art on SPair-71k, AP-10K, and PF-PASCAL, with gains that amplify at fine-grained localization thresholds (+8.9 PCK@0.01), strongest generalization to unseen keypoints (+5.1, SPair-U) and categories (+4.7, MP-100), while remaining 3x smaller and 10x faster than diffusion-based approaches. Code is available at this https URL .
MARCO:在语义对应的未知空间中导航 / MARCO: Navigating the Unseen Space of Semantic Correspondence
本文提出了一种名为MARCO的轻量级模型,通过结合从粗到细的空间定位和自蒸馏训练框架,仅用少量标注关键点就能生成密集且语义一致的图像对应关系,在多个基准测试中显著超越现有方法,同时模型大小和推理速度分别只有扩散方法的1/3和10倍,尤其擅长处理训练中未见过的关键点和物体类别。
源自 arXiv: 2604.18267