菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-26
📄 Abstract - MoE-GRPO: Optimizing Mixture-of-Experts via Reinforcement Learning in Vision-Language Models

Mixture-of-Experts (MoE) has emerged as an effective approach to reduce the computational overhead of Transformer architectures by sparsely activating a subset of parameters for each token while preserving high model capacity. This paradigm has recently been extended to Vision-Language Models (VLMs), enabling scalable multi-modal understanding with reduced computational cost. However, the widely adopted deterministic top-K routing mechanism may overlook more optimal expert combinations and lead to expert overfitting. To address this limitation and improve the diversity of expert selection, we propose MoE-GRPO, a reinforcement learning (RL)-based framework for optimizing expert routing in MoE-based VLMs. Specifically, we formulate expert selection as a sequential decision-making problem and optimize it using Group Relative Policy Optimization (GRPO), allowing the model to learn adaptive expert routing policies through exploration and reward-based feedback. Furthermore, we introduce a modality-aware router guidance that enhances training stability and efficiency by discouraging the router from exploring experts that are infrequently activated for a given modality. Extensive experiments on multi-modal image and video benchmarks show that MoE-GRPO consistently outperforms standard top-K routing and its variants by promoting more diverse expert selection, thereby mitigating expert overfitting and enabling a task-level expert specialization.

顶级标签: multi-modal model training machine learning
详细标签: mixture-of-experts reinforcement learning vision-language models expert routing policy optimization 或 搜索:

MoE-GRPO:通过强化学习优化视觉语言模型中的专家混合机制 / MoE-GRPO: Optimizing Mixture-of-Experts via Reinforcement Learning in Vision-Language Models


1️⃣ 一句话总结

这篇论文提出了一种名为MoE-GRPO的新方法,它利用强化学习来动态优化视觉语言模型中‘专家混合’模块的决策过程,从而让模型能更灵活、更有效地选择和使用不同的‘专家’子网络来处理多模态任务,最终提升了模型性能并防止了‘专家’的过度依赖。

源自 arXiv: 2603.24984