菜单

🤖 系统
📄 Abstract - OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Recent advancements in large reasoning models have fueled growing interest in extending such capabilities to multimodal domains. However, despite notable progress in visual reasoning, the lack of transparent and reproducible data curation and training strategies remains a major barrier to scalable research. In this work, we introduce OpenMMReasoner, a fully transparent two-stage recipe for multimodal reasoning spanning supervised fine-tuning (SFT) and reinforcement learning (RL). In the SFT stage, we construct an 874K-sample cold-start dataset with rigorous step-by-step validation, providing a strong foundation for reasoning capabilities. The subsequent RL stage leverages a 74K-sample dataset across diverse domains to further sharpen and stabilize these abilities, resulting in a more robust and efficient learning process. Extensive evaluations demonstrate that our training recipe not only surpasses strong baselines but also highlights the critical role of data quality and training design in shaping multimodal reasoning performance. Notably, our method achieves a 11.6% improvement over the Qwen2.5-VL-7B-Instruct baseline across nine multimodal reasoning benchmarks, establishing a solid empirical foundation for future large-scale multimodal reasoning research. We open-sourced all our codes, pipeline, and data at this https URL.

顶级标签: multi-modal model training model evaluation
详细标签: multimodal reasoning supervised fine-tuning reinforcement learning benchmark evaluation data curation 或 搜索:

📄 论文总结

OpenMMReasoner:通过开放通用方法推动多模态推理前沿 / OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe


1️⃣ 一句话总结

这项研究提出了一个完全透明的两阶段训练方法,通过精心构建的数据集和强化学习显著提升了多模态推理能力,在多个基准测试中比现有领先模型性能提升11.6%。


📄 打开原文 PDF