VideoMaMa: Mask-Guided Video Matting via Generative Prior

📄 Abstract - VideoMaMa: Mask-Guided Video Matting via Generative Prior

Generalizing video matting models to real-world videos remains a significant challenge due to the scarcity of labeled data. To address this, we present Video Mask-to-Matte Model (VideoMaMa) that converts coarse segmentation masks into pixel accurate alpha mattes, by leveraging pretrained video diffusion models. VideoMaMa demonstrates strong zero-shot generalization to real-world footage, even though it is trained solely on synthetic data. Building on this capability, we develop a scalable pseudo-labeling pipeline for large-scale video matting and construct the Matting Anything in Video (MA-V) dataset, which offers high-quality matting annotations for more than 50K real-world videos spanning diverse scenes and motions. To validate the effectiveness of this dataset, we fine-tune the SAM2 model on MA-V to obtain SAM2-Matte, which outperforms the same model trained on existing matting datasets in terms of robustness on in-the-wild videos. These findings emphasize the importance of large-scale pseudo-labeled video matting and showcase how generative priors and accessible segmentation cues can drive scalable progress in video matting research.

VideoMaMa：基于生成先验的掩码引导视频抠图 / VideoMaMa: Mask-Guided Video Matting via Generative Prior

1️⃣ 一句话总结

该论文提出了一个名为VideoMaMa的新方法，它利用预训练的视频扩散模型，仅需粗略的分割掩码就能生成精确的视频抠图，并在合成数据训练后能直接处理真实世界视频，同时构建了一个大规模伪标注视频抠图数据集来推动该领域研究。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要