菜单

🤖 系统
📄 Abstract - Video Generation Models Are Good Latent Reward Models

Reward feedback learning (ReFL) has proven effective for aligning image generation with human preferences. However, its extension to video generation faces significant challenges. Existing video reward models rely on vision-language models designed for pixel-space inputs, confining ReFL optimization to near-complete denoising steps after computationally expensive VAE decoding. This pixel-space approach incurs substantial memory overhead and increased training time, and its late-stage optimization lacks early-stage supervision, refining only visual quality rather than fundamental motion dynamics and structural coherence. In this work, we show that pre-trained video generation models are naturally suited for reward modeling in the noisy latent space, as they are explicitly designed to process noisy latent representations at arbitrary timesteps and inherently preserve temporal information through their sequential modeling capabilities. Accordingly, we propose Process Reward Feedback Learning~(PRFL), a framework that conducts preference optimization entirely in latent space, enabling efficient gradient backpropagation throughout the full denoising chain without VAE decoding. Extensive experiments demonstrate that PRFL significantly improves alignment with human preferences, while achieving substantial reductions in memory consumption and training time compared to RGB ReFL.

顶级标签: video generation model training reinforcement learning
详细标签: latent reward modeling preference optimization video alignment efficient training denoising process 或 搜索:

📄 论文总结

视频生成模型是优秀的潜在奖励模型 / Video Generation Models Are Good Latent Reward Models


1️⃣ 一句话总结

这项研究提出了一种名为PRFL的新方法,直接在视频生成的潜在空间中进行偏好优化,从而在显著降低计算成本和内存消耗的同时,更好地让生成的视频内容符合人类偏好。


📄 打开原文 PDF