Unified Thinker: A General Reasoning Modular Core for Image Generation

📄 Abstract - Unified Thinker: A General Reasoning Modular Core for Image Generation

Despite impressive progress in high-fidelity image synthesis, generative models still struggle with logic-intensive instruction following, exposing a persistent reasoning--execution gap. Meanwhile, closed-source systems (e.g., Nano Banana) have demonstrated strong reasoning-driven image generation, highlighting a substantial gap to current open-source models. We argue that closing this gap requires not merely better visual generators, but executable reasoning: decomposing high-level intents into grounded, verifiable plans that directly steer the generative process. To this end, we propose Unified Thinker, a task-agnostic reasoning architecture for general image generation, designed as a unified planning core that can plug into diverse generators and workflows. Unified Thinker decouples a dedicated Thinker from the image Generator, enabling modular upgrades of reasoning without retraining the entire generative model. We further introduce a two-stage training paradigm: we first build a structured planning interface for the Thinker, then apply reinforcement learning to ground its policy in pixel-level feedback, encouraging plans that optimize visual correctness over textual plausibility. Extensive experiments on text-to-image generation and image editing show that Unified Thinker substantially improves image reasoning and generation quality.

统一思考者：用于图像生成的通用推理模块化核心 / Unified Thinker: A General Reasoning Modular Core for Image Generation

1️⃣ 一句话总结

这篇论文提出了一个名为‘统一思考者’的模块化推理核心，它通过将复杂的图像生成指令分解为可执行的、可验证的计划，并独立于图像生成器进行训练，从而显著提升了图像生成模型在逻辑推理和指令遵循方面的能力。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要