菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-08
📄 Abstract - Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing

In-context image generation and editing (ICGE) enables users to specify visual concepts through interleaved image-text prompts, demanding precise understanding and faithful execution of user intent. Although recent unified multimodal models exhibit promising understanding capabilities, these strengths often fail to transfer effectively to image generation. We introduce Re-Align, a unified framework that bridges the gap between understanding and generation through structured reasoning-guided alignment. At its core lies the In-Context Chain-of-Thought (IC-CoT), a structured reasoning paradigm that decouples semantic guidance and reference association, providing clear textual target and mitigating confusion among reference images. Furthermore, Re-Align introduces an effective RL training scheme that leverages a surrogate reward to measure the alignment between structured reasoning text and the generated image, thereby improving the model's overall performance on ICGE tasks. Extensive experiments verify that Re-Align outperforms competitive methods of comparable model scale and resources on both in-context image generation and editing tasks.

顶级标签: multi-modal model training aigc
详细标签: in-context learning image generation image editing reasoning alignment reinforcement learning 或 搜索:

Re-Align:基于结构化推理引导的对齐方法,用于上下文图像生成与编辑 / Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing


1️⃣ 一句话总结

这篇论文提出了一个名为Re-Align的统一框架,它通过一种结构化的推理方法,有效弥合了模型对图文指令的理解能力与图像生成能力之间的差距,从而在根据上下文(多图多文)提示进行图像生成和编辑的任务上取得了更好的效果。

源自 arXiv: 2601.05124