Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation

📄 Abstract - Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation

Reinforcement learning, particularly Group Relative Policy Optimization (GRPO), has emerged as an effective framework for post-training visual generative models with human preference signals. However, its effectiveness is fundamentally limited by coarse reward credit assignment. In modern visual generation, multiple reward models are often used to capture heterogeneous objectives, such as visual quality, motion consistency, and text alignment. Existing GRPO pipelines typically collapse these rewards into a single static scalar and propagate it uniformly across the entire diffusion trajectory. This design ignores the stage-specific roles of different denoising steps and produces mistimed or incompatible optimization signals. To address this issue, we propose Objective-aware Trajectory Credit Assignment (OTCA), a structured framework for fine-grained GRPO training. OTCA consists of two key components. Trajectory-Level Credit Decomposition estimates the relative importance of different denoising steps. Multi-Objective Credit Allocation adaptively weights and combines multiple reward signals throughout the denoising process. By jointly modeling temporal credit and objective-level credit, OTCA converts coarse reward supervision into a structured, timestep-aware training signal that better matches the iterative nature of diffusion-based generation. Extensive experiments show that OTCA consistently improves both image and video generation quality across evaluation metrics.

学习为正确的步骤分配功劳：面向目标的视觉生成过程优化 / Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation

1️⃣ 一句话总结

本文提出了一种名为OTCA的框架，通过将多个奖励信号（如图像质量、运动一致性）按去噪步骤的重要性进行分解和自适应分配，从而让强化学习训练更精准地指导视觉生成模型，显著提升图像和视频的生成质量。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要