菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-22
📄 Abstract - RS-Gen: A Multi-Stage Agentic Framework for Reasoning and Search-Augmented Image Generation

Recent years have witnessed remarkable progress in image generation and editing, particularly regarding instruction following and visual fidelity. However, when handling ambiguous intentions, logical reasoning, and Out-of-Distribution (OOD) knowledge, existing image models often yield sub-optimal results due to a lack of deep reasoning capabilities and real-time external information. Although emerging unified understanding-and-generation models attempt to bridge this gap, they remain constrained by their intrinsic parameter scales and static knowledge gaps. Inspired by agentic paradigms, we propose RS-Gen: a plug-and-play, training-free, multi-stage image agentic framework. RS-Gen innovatively introduces a "Questioning-and-Solving" closed-loop mechanism to accurately identify logical issues and knowledge gaps, autonomously planning actions to bridge information deficits and execute deep logical reasoning. Extensive experiments demonstrate that RS-Gen significantly expands the capability boundaries of foundational image generation and editing models. Specifically, on the WISE Verified and RISEBench benchmarks, RS-Gen yields substantial absolute performance gains of 0.313 for Qwen-Image and 19.70 for Qwen-Image-Edit-2511, respectively, successfully elevating both to the state-of-the-art (SOTA) level among open-source models.

顶级标签: computer vision agents aigc
详细标签: image generation reasoning search augmentation multi-stage framework agentic paradigm 或 搜索:

RS-Gen:一种用于推理与搜索增强图像生成的多阶段智能体框架 / RS-Gen: A Multi-Stage Agentic Framework for Reasoning and Search-Augmented Image Generation


1️⃣ 一句话总结

本文提出了一种无需额外训练、即插即用的多阶段智能体框架RS-Gen,通过模拟“提问-解决”的闭环机制,让图像生成模型在遇到模糊指令、复杂逻辑或缺乏常识时能主动查找外部信息并进行深度推理,从而显著提升生成效果,并达到了开源模型的领先水平。

源自 arXiv: 2606.23221