超越二元偏好:通过解耦属性将扩散模型与细粒度标准对齐 / Beyond Binary Preference: Aligning Diffusion Models to Fine-grained Criteria by Decoupling Attributes
1️⃣ 一句话总结
这篇论文提出了一种新的方法,通过将图像质量分解为树状结构的多个正负属性,并设计一个两阶段的对齐框架,使扩散模型能够依据复杂、细粒度的人类专家标准生成更高质量的图像,而不仅仅是依赖简单的二元偏好或单一奖励信号。
Post-training alignment of diffusion models relies on simplified signals, such as scalar rewards or binary preferences. This limits alignment with complex human expertise, which is hierarchical and fine-grained. To address this, we first construct a hierarchical, fine-grained evaluation criteria with domain experts, which decomposes image quality into multiple positive and negative attributes organized in a tree structure. Building on this, we propose a two-stage alignment framework. First, we inject domain knowledge to an auxiliary diffusion model via Supervised Fine-Tuning. Second, we introduce Complex Preference Optimization (CPO) that extends DPO to align the target diffusion to our non-binary, hierarchical criteria. Specifically, we reformulate the alignment problem to simultaneously maximize the probability of positive attributes while minimizing the probability of negative attributes with the auxiliary diffusion. We instantiate our approach in the domain of painting generation and conduct CPO training with an annotated dataset of painting with fine-grained attributes based on our criteria. Extensive experiments demonstrate that CPO significantly enhances generation quality and alignment with expertise, opening new avenues for fine-grained criteria alignment.
超越二元偏好:通过解耦属性将扩散模型与细粒度标准对齐 / Beyond Binary Preference: Aligning Diffusion Models to Fine-grained Criteria by Decoupling Attributes
这篇论文提出了一种新的方法,通过将图像质量分解为树状结构的多个正负属性,并设计一个两阶段的对齐框架,使扩散模型能够依据复杂、细粒度的人类专家标准生成更高质量的图像,而不仅仅是依赖简单的二元偏好或单一奖励信号。
源自 arXiv: 2601.04300