菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-06
📄 Abstract - Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision

We present Vanast, a unified framework that generates garment-transferred human animation videos directly from a single human image, garment images, and a pose guidance video. Conventional two-stage pipelines treat image-based virtual try-on and pose-driven animation as separate processes, which often results in identity drift, garment distortion, and front-back inconsistency. Our model addresses these issues by performing the entire process in a single unified step to achieve coherent synthesis. To enable this setting, we construct large-scale triplet supervision. Our data generation pipeline includes generating identity-preserving human images in alternative outfits that differ from garment catalog images, capturing full upper and lower garment triplets to overcome the single-garment-posed video pair limitation, and assembling diverse in-the-wild triplets without requiring garment catalog images. We further introduce a Dual Module architecture for video diffusion transformers to stabilize training, preserve pretrained generative quality, and improve garment accuracy, pose adherence, and identity preservation while supporting zero-shot garment interpolation. Together, these contributions allow Vanast to produce high-fidelity, identity-consistent animation across a wide range of garment types.

顶级标签: computer vision multi-modal aigc
详细标签: virtual try-on human animation video generation diffusion transformers synthetic data 或 搜索:

Vanast:通过合成三元组监督实现基于人体图像动画的虚拟试穿 / Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision


1️⃣ 一句话总结

这篇论文提出了一个名为Vanast的统一框架,能够仅凭一张人物照片、服装图片和姿态引导视频,直接生成服装转移后的人物动画视频,解决了传统方法中身份漂移、服装变形等问题,实现了连贯且高质量的视频合成。

源自 arXiv: 2604.04934