菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-04
📄 Abstract - Guided Verifier: Collaborative Multimodal Reasoning via Dynamic Process Supervision

Reinforcement Learning (RL) has emerged as a pivotal mechanism for enhancing the complex reasoning capabilities of Multimodal Large Language Models (MLLMs). However, prevailing paradigms typically rely on solitary rollout strategies where the model works alone. This lack of intermediate oversight renders the reasoning process susceptible to error propagation, where early logical deviations cascade into irreversible failures, resulting in noisy optimization signals. In this paper, we propose the \textbf{Guided Verifier} framework to address these structural limitations. Moving beyond passive terminal rewards, we introduce a dynamic verifier that actively co-solves tasks alongside the policy. During the rollout phase, this verifier interacts with the policy model in real-time, detecting inconsistencies and providing directional signals to steer the model toward valid trajectories. To facilitate this, we develop a specialized data synthesis pipeline targeting multimodal hallucinations, constructing \textbf{CoRe} dataset of process-level negatives and \textbf{Co}rrect-guide \textbf{Re}asoning trajectories to train the guided verifier. Extensive experiments on MathVista, MathVerse and MMMU indicate that by allocating compute to collaborative inference and dynamic verification, an 8B-parameter model can achieve strong performance.

顶级标签: multi-modal model training agents
详细标签: multimodal reasoning reinforcement learning process supervision error correction collaborative inference 或 搜索:

引导验证器:通过动态过程监督实现协作式多模态推理 / Guided Verifier: Collaborative Multimodal Reasoning via Dynamic Process Supervision


1️⃣ 一句话总结

这篇论文提出了一种名为‘引导验证器’的新框架,通过让一个专门的验证模型在推理过程中实时监督和引导主模型,有效防止错误累积,从而显著提升了多模态大模型在复杂数学和推理任务上的表现。

源自 arXiv: 2602.04290