ViPO:大规模视觉偏好优化 / ViPO: Visual Preference Optimization at Scale
1️⃣ 一句话总结
本文提出了一套系统方案来提升视觉生成模型的偏好优化效果:一方面设计了一种自适应算法(Poly-DPO),能在存在噪声和偏好冲突的现有数据集上稳健学习;另一方面构建了一个高质量、大规模的新数据集(ViPO),包含百万级高分辨率图像和视频对,从而验证了数据质量才是扩展视觉偏好优化的关键,而复杂优化算法仅在数据不完美时才显现优势。
While preference optimization is crucial for improving visual generative models, how to effectively scale this paradigm remains largely unexplored. Current open-source preference datasets contain conflicting preference patterns, where winners excel in some dimensions but underperform in others. Naively optimizing on such noisy datasets fails to learn preferences, hindering effective scaling. To enhance robustness against noise, we propose Poly-DPO, which extends the DPO objective with an additional polynomial term that dynamically adjusts model confidence based on dataset characteristics, enabling effective learning across diverse data distributions. Beyond biased patterns, existing datasets suffer from low resolution, limited prompt diversity, and imbalanced distributions. To facilitate large-scale visual preference optimization by tackling data bottlenecks, we construct ViPO, a massive-scale preference dataset with 1M image pairs at 1024px across five categories and 300K video pairs at 720p+ across three categories. State-of-the-art generative models and diverse prompts ensure reliable preference signals with balanced distributions. Remarkably, when applying Poly-DPO to our high-quality dataset, the optimal configuration converges to standard DPO. This convergence validates dataset quality and Poly-DPO's adaptive nature: sophisticated optimization becomes unnecessary with sufficient data quality, yet remains valuable for imperfect datasets. We validate our approach across visual generation models. On noisy datasets like Pick-a-Pic V2, Poly-DPO achieves 6.87 and 2.32 gains over Diffusion-DPO on GenEval for SD1.5 and SDXL, respectively. For ViPO, models achieve performance far exceeding those trained on existing open-source preference datasets. These results confirm that addressing both algorithmic adaptability and data quality is essential for scaling visual preference optimization.
ViPO:大规模视觉偏好优化 / ViPO: Visual Preference Optimization at Scale
本文提出了一套系统方案来提升视觉生成模型的偏好优化效果:一方面设计了一种自适应算法(Poly-DPO),能在存在噪声和偏好冲突的现有数据集上稳健学习;另一方面构建了一个高质量、大规模的新数据集(ViPO),包含百万级高分辨率图像和视频对,从而验证了数据质量才是扩展视觉偏好优化的关键,而复杂优化算法仅在数据不完美时才显现优势。
源自 arXiv: 2604.24953