ThinkRL-Edit:强化学习中的思维——面向推理中心的图像编辑 / ThinkRL-Edit: Thinking in Reinforcement Learning for Reasoning-Centric Image Editing
1️⃣ 一句话总结
这篇论文提出了一个名为ThinkRL-Edit的新方法,它通过强化学习让AI在编辑图片前先进行多步骤的‘思考’和‘验证’,从而更准确地完成需要复杂理解和推理的图片编辑任务。
Instruction-driven image editing with unified multimodal generative models has advanced rapidly, yet their underlying visual reasoning remains limited, leading to suboptimal performance on reasoning-centric edits. Reinforcement learning (RL) has been investigated for improving the quality of image editing, but it faces three key challenges: (1) limited reasoning exploration confined to denoising stochasticity, (2) biased reward fusion, and (3) unstable VLM-based instruction rewards. In this work, we propose ThinkRL-Edit, a reasoning-centric RL framework that decouples visual reasoning from image synthesis and expands reasoning exploration beyond denoising. To the end, we introduce Chain-of-Thought (CoT)-based reasoning sampling with planning and reflection stages prior to generation in online sampling, compelling the model to explore multiple semantic hypotheses and validate their plausibility before committing to a visual outcome. To avoid the failures of weighted aggregation, we propose an unbiased chain preference grouping strategy across multiple reward dimensions. Moreover, we replace interval-based VLM scores with a binary checklist, yielding more precise, lower-variance, and interpretable rewards for complex reasoning. Experiments show our method significantly outperforms prior work on reasoning-centric image editing, producing instruction-faithful, visually coherent, and semantically grounded edits.
ThinkRL-Edit:强化学习中的思维——面向推理中心的图像编辑 / ThinkRL-Edit: Thinking in Reinforcement Learning for Reasoning-Centric Image Editing
这篇论文提出了一个名为ThinkRL-Edit的新方法,它通过强化学习让AI在编辑图片前先进行多步骤的‘思考’和‘验证’,从而更准确地完成需要复杂理解和推理的图片编辑任务。
源自 arXiv: 2601.03467