VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis

📄 Abstract - VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis

Diffusion models have driven remarkable advancements in fashion image generation, yet prior works usually treat garment generation and virtual dressing as separate problems, limiting their flexibility in real-world fashion workflows. Moreover, fashion image synthesis under multi-source heterogeneous conditions remains challenging, as existing methods typically rely on simple feature concatenation or static layer-wise injection, which often causes attribute entanglement and semantic interference. To address these issues, we propose VersaVogue, a unified framework for multi-condition controllable fashion synthesis that jointly supports garment generation and virtual dressing, corresponding to the design and showcase stages of the fashion lifecycle. Specifically, we introduce a trait-routing attention (TA) module that leverages a mixture-of-experts mechanism to dynamically route condition features to the most compatible experts and generative layers, enabling disentangled injection of visual attributes such as texture, shape, and color. To further improve realism and controllability, we develop an automated multi-perspective preference optimization (MPO) pipeline that constructs preference data without human annotation or task-specific reward models. By combining evaluators of content fidelity, textual alignment, and perceptual quality, MPO identifies reliable preference pairs, which are then used to optimize the model via direct preference optimization (DPO). Extensive experiments on both garment generation and virtual dressing benchmarks demonstrate that VersaVogue consistently outperforms existing methods in visual fidelity, semantic consistency, and fine-grained controllability.

VersaVogue：面向统一时尚合成的视觉专家编排与偏好对齐 / VersaVogue: Visual Expert Orchestration and Preference Alignment for Unified Fashion Synthesis

1️⃣ 一句话总结

这篇论文提出了一个名为VersaVogue的统一框架，它通过动态路由条件特征和自动化偏好优化，同时解决了服装生成和虚拟试衣两大时尚任务，显著提升了生成图像的逼真度、语义一致性和细节控制能力。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要