3AM:在视频中实现几何一致性的任意物体分割 / 3AM: Segment Anything with Geometric Consistency in Videos
1️⃣ 一句话总结
这篇论文提出了一个名为3AM的新方法,它通过将能感知3D几何的特征融入先进的视频分割模型,让AI在视频中分割物体时,即使摄像机视角剧烈变化,也能保持对同一物体分割结果的一致性,而且只需要普通的视频画面,无需额外的深度或摄像机信息。
Video object segmentation methods like SAM2 achieve strong performance through memory-based architectures but struggle under large viewpoint changes due to reliance on appearance features. Traditional 3D instance segmentation methods address viewpoint consistency but require camera poses, depth maps, and expensive preprocessing. We introduce 3AM, a training-time enhancement that integrates 3D-aware features from MUSt3R into SAM2. Our lightweight Feature Merger fuses multi-level MUSt3R features that encode implicit geometric correspondence. Combined with SAM2's appearance features, the model achieves geometry-consistent recognition grounded in both spatial position and visual similarity. We propose a field-of-view aware sampling strategy ensuring frames observe spatially consistent object regions for reliable 3D correspondence learning. Critically, our method requires only RGB input at inference, with no camera poses or preprocessing. On challenging datasets with wide-baseline motion (ScanNet++, Replica), 3AM substantially outperforms SAM2 and extensions, achieving 90.6% IoU and 71.7% Positive IoU on ScanNet++'s Selected Subset, improving over state-of-the-art VOS methods by +15.9 and +30.4 points. Project page: this https URL
3AM:在视频中实现几何一致性的任意物体分割 / 3AM: Segment Anything with Geometric Consistency in Videos
这篇论文提出了一个名为3AM的新方法,它通过将能感知3D几何的特征融入先进的视频分割模型,让AI在视频中分割物体时,即使摄像机视角剧烈变化,也能保持对同一物体分割结果的一致性,而且只需要普通的视频画面,无需额外的深度或摄像机信息。
源自 arXiv: 2601.08831