通过自动质量引导的自训练提升无监督视频实例分割性能 / Boosting Unsupervised Video Instance Segmentation with Automatic Quality-Guided Self-Training
1️⃣ 一句话总结
这篇论文提出了一个名为AutoQ-VIS的无监督学习框架,它通过一个自动评估伪标签质量并引导模型自我训练的闭环系统,成功缩小了合成数据与真实视频之间的差距,在无需人工标注的情况下,显著提升了视频中物体识别与分割的准确性。
Video Instance Segmentation (VIS) faces significant annotation challenges due to its dual requirements of pixel-level masks and temporal consistency labels. While recent unsupervised methods like VideoCutLER eliminate optical flow dependencies through synthetic data, they remain constrained by the synthetic-to-real domain gap. We present AutoQ-VIS, a novel unsupervised framework that bridges this gap through quality-guided self-training. Our approach establishes a closed-loop system between pseudo-label generation and automatic quality assessment, enabling progressive adaptation from synthetic to real videos. Experiments demonstrate state-of-the-art performance with 52.6 $\text{AP}_{50}$ on YouTubeVIS-2019 $\texttt{val}$ set, surpassing the previous state-of-the-art VideoCutLER by 4.4%, while requiring no human annotations. This demonstrates the viability of quality-aware self-training for unsupervised VIS. We will release the code at this https URL.
通过自动质量引导的自训练提升无监督视频实例分割性能 / Boosting Unsupervised Video Instance Segmentation with Automatic Quality-Guided Self-Training
这篇论文提出了一个名为AutoQ-VIS的无监督学习框架,它通过一个自动评估伪标签质量并引导模型自我训练的闭环系统,成功缩小了合成数据与真实视频之间的差距,在无需人工标注的情况下,显著提升了视频中物体识别与分割的准确性。
源自 arXiv: 2512.06864