菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-21
📄 Abstract - Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation

Reinforcement learning, particularly Group Relative Policy Optimization (GRPO), has emerged as an effective framework for post-training visual generative models with human preference signals. However, its effectiveness is fundamentally limited by coarse reward credit assignment. In modern visual generation, multiple reward models are often used to capture heterogeneous objectives, such as visual quality, motion consistency, and text alignment. Existing GRPO pipelines typically collapse these rewards into a single static scalar and propagate it uniformly across the entire diffusion trajectory. This design ignores the stage-specific roles of different denoising steps and produces mistimed or incompatible optimization signals. To address this issue, we propose Objective-aware Trajectory Credit Assignment (OTCA), a structured framework for fine-grained GRPO training. OTCA consists of two key components. Trajectory-Level Credit Decomposition estimates the relative importance of different denoising steps. Multi-Objective Credit Allocation adaptively weights and combines multiple reward signals throughout the denoising process. By jointly modeling temporal credit and objective-level credit, OTCA converts coarse reward supervision into a structured, timestep-aware training signal that better matches the iterative nature of diffusion-based generation. Extensive experiments show that OTCA consistently improves both image and video generation quality across evaluation metrics.

顶级标签: reinforcement learning computer vision video generation
详细标签: grpo reward credit assignment diffusion models multi-objective optimization visual generation 或 搜索:

学习为正确的步骤分配功劳:面向目标的视觉生成过程优化 / Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation


1️⃣ 一句话总结

本文提出了一种名为OTCA的框架,通过将多个奖励信号(如图像质量、运动一致性)按去噪步骤的重要性进行分解和自适应分配,从而让强化学习训练更精准地指导视觉生成模型,显著提升图像和视频的生成质量。

源自 arXiv: 2604.19234