RePlan:基于推理引导的区域规划用于复杂指令图像编辑 / RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing
1️⃣ 一句话总结
这篇论文提出了一个名为RePlan的新框架,它通过一个‘先规划后执行’的方法,利用视觉语言模型进行逐步推理来分解复杂指令并定位到图像的具体区域,然后结合扩散模型实现精准、并行的多区域编辑,有效解决了现有模型在处理复杂指令和杂乱场景时面临的困难。
Instruction-based image editing enables natural-language control over visual modifications, yet existing models falter under Instruction-Visual Complexity (IV-Complexity), where intricate instructions meet cluttered or ambiguous scenes. We introduce RePlan (Region-aligned Planning), a plan-then-execute framework that couples a vision-language planner with a diffusion editor. The planner decomposes instructions via step-by-step reasoning and explicitly grounds them to target regions; the editor then applies changes using a training-free attention-region injection mechanism, enabling precise, parallel multi-region edits without iterative inpainting. To strengthen planning, we apply GRPO-based reinforcement learning using 1K instruction-only examples, yielding substantial gains in reasoning fidelity and format reliability. We further present IV-Edit, a benchmark focused on fine-grained grounding and knowledge-intensive edits. Across IV-Complex settings, RePlan consistently outperforms strong baselines trained on far larger datasets, improving regional precision and overall fidelity. Our project page: this https URL
RePlan:基于推理引导的区域规划用于复杂指令图像编辑 / RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing
这篇论文提出了一个名为RePlan的新框架,它通过一个‘先规划后执行’的方法,利用视觉语言模型进行逐步推理来分解复杂指令并定位到图像的具体区域,然后结合扩散模型实现精准、并行的多区域编辑,有效解决了现有模型在处理复杂指令和杂乱场景时面临的困难。
源自 arXiv: 2512.16864