RS-Gen: A Multi-Stage Agentic Framework for Reasoning and Search-Augmented Image Generation

📄 Abstract - RS-Gen: A Multi-Stage Agentic Framework for Reasoning and Search-Augmented Image Generation

Recent years have witnessed remarkable progress in image generation and editing, particularly regarding instruction following and visual fidelity. However, when handling ambiguous intentions, logical reasoning, and Out-of-Distribution (OOD) knowledge, existing image models often yield sub-optimal results due to a lack of deep reasoning capabilities and real-time external information. Although emerging unified understanding-and-generation models attempt to bridge this gap, they remain constrained by their intrinsic parameter scales and static knowledge gaps. Inspired by agentic paradigms, we propose RS-Gen: a plug-and-play, training-free, multi-stage image agentic framework. RS-Gen innovatively introduces a "Questioning-and-Solving" closed-loop mechanism to accurately identify logical issues and knowledge gaps, autonomously planning actions to bridge information deficits and execute deep logical reasoning. Extensive experiments demonstrate that RS-Gen significantly expands the capability boundaries of foundational image generation and editing models. Specifically, on the WISE Verified and RISEBench benchmarks, RS-Gen yields substantial absolute performance gains of 0.313 for Qwen-Image and 19.70 for Qwen-Image-Edit-2511, respectively, successfully elevating both to the state-of-the-art (SOTA) level among open-source models.

RS-Gen：一种用于推理与搜索增强图像生成的多阶段智能体框架 / RS-Gen: A Multi-Stage Agentic Framework for Reasoning and Search-Augmented Image Generation

1️⃣ 一句话总结

本文提出了一种无需额外训练、即插即用的多阶段智能体框架RS-Gen，通过模拟“提问-解决”的闭环机制，让图像生成模型在遇到模糊指令、复杂逻辑或缺乏常识时能主动查找外部信息并进行深度推理，从而显著提升生成效果，并达到了开源模型的领先水平。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要