与你同见:面向多模态推理的感知-推理协同进化 / Seeing with You: Perception-Reasoning Coevolution for Multimodal Reasoning
1️⃣ 一句话总结
这篇论文提出了一个名为PRCO的新框架,通过让负责‘看’的模块和负责‘想’的模块协同进化、互相促进,有效解决了多模态AI模型在图像推理任务中‘看’不准导致‘想’不对的问题,从而显著提升了模型的整体推理准确率。
Reinforcement learning with verifiable rewards (RLVR) has substantially enhanced the reasoning capabilities of multimodal large language models (MLLMs). However, existing RLVR approaches typically rely on outcome-driven optimization that updates both perception and reasoning using a shared reward based solely on the final answer. This shared reward blurs credit assignment, frequently improving reasoning patterns while failing to reliably enhance the accuracy of upstream visual evidence extraction. To address this perception bottleneck, we introduce PRCO (Perception-Reasoning Coevolution), a dual-role RLVR framework with a shared policy. PRCO consists of two cooperative roles: an Observer that generates an evidence caption tailored to the question and a Solver that predicts the final answer based on this caption. Crucially, PRCO employs role-specific reward signals: the Solver is optimized using verifiable outcome rewards on the final answer, while the Observer receives a utility reward derived from the Solver's downstream success. Extensive experiments across eight challenging multimodal reasoning benchmarks demonstrate that PRCO yields consistent improvements across model scales by over 7 points on average accuracy compared to the base model, outperforming prior open-source RL-tuned baselines.
与你同见:面向多模态推理的感知-推理协同进化 / Seeing with You: Perception-Reasoning Coevolution for Multimodal Reasoning
这篇论文提出了一个名为PRCO的新框架,通过让负责‘看’的模块和负责‘想’的模块协同进化、互相促进,有效解决了多模态AI模型在图像推理任务中‘看’不准导致‘想’不对的问题,从而显著提升了模型的整体推理准确率。
源自 arXiv: 2603.28618