Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning

📄 Abstract - Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning

Despite rapid advancements, current text-to-image (T2I) models predominantly rely on a single-step generation paradigm, which struggles with complex semantics and faces diminishing returns from parameter scaling. While recent multi-step reasoning approaches show promise, they are hindered by ungrounded planning hallucinations lacking verification, monolithic post-hoc reflection, long-context optimization instabilities, and prohibitive inference latency. To overcome these bottlenecks, we propose the Closed-Loop Visual Reasoning (CLVR) framework, a comprehensive system that deeply couples visual-language logical planning with pixel-level diffusion generation. CLVR introduces an automated data engine with step-level visual verification to synthesize reliable reasoning trajectories, and proposes Proxy Prompt Reinforcement Learning (PPRL) to resolve long-context optimization instabilities by distilling interleaved multimodal histories into explicit reward signals for accurate causal attribution. Furthermore, to mitigate the severe latency bottleneck caused by iterative denoising, we propose $\Delta$-Space Weight Merge (DSWM), a theoretically grounded method that fuses alignment weights with off-the-shelf distillation priors, reducing the per-step inference cost to just 4 NFEs without requiring expensive re-distillation. Extensive experiments demonstrate that CLVR outperforms existing open-source baselines across multiple benchmarks and approaches the performance of proprietary commercial models, unlocking general test-time scaling capabilities for complex visual generation.

通过闭环验证推理解锁复杂视觉生成 / Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning

1️⃣ 一句话总结

本文提出了一种名为CLVR的新型视觉生成框架，通过将语言逻辑推理与像素级扩散生成深度结合，并引入自动数据引擎验证、强化学习优化和权重融合加速，有效解决了现有文本生成图像模型在处理复杂语义时的规划幻觉、优化不稳定和推理速度慢等问题，从而在多个基准上达到接近商业模型的性能。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要