L2R:用于专家混合模型(MoE)的低秩与利普希茨约束路由方法 / L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts
1️⃣ 一句话总结
本文提出了一种名为L2R的新路由框架,通过将专家分配过程转移到共享的低维空间并引入平滑的评分机制,有效解决了专家混合模型中路由不稳定和专家分工不明确的问题,从而显著提升了模型性能。
Mixture-of-Experts (MoE) models scale neural networks by conditionally activating a small subset of experts, where the router plays a central role in determining expert specialization and overall model performance. However, many modern MoE systems still adopt linear routers in raw high-dimensional representation spaces, where representation mismatch, angular concentration, and scale-sensitive scoring can jointly undermine routing discriminability and stable expert specialization. In this work, we propose Low-rank \& Lipschitz-controlled Routing (L2R), a unified routing framework that reshapes both the routing space and scoring geometry. L2R performs expert assignment in a shared low-rank latent routing space and introduces Saturated Inner-Product Scoring (SIPS) to explicitly control the Lipschitz behavior of routing functions, yielding smoother and more stable routing geometry. In addition, L2R incorporates a parameter-efficient multi-anchor routing mechanism to enhance expert expressiveness. Extensive experiments on a large-scale language MoE model and a vision MoE setting on ImageNet demonstrate that L2R consistently improves routing stability, expert specialization, and overall model performance.
L2R:用于专家混合模型(MoE)的低秩与利普希茨约束路由方法 / L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts
本文提出了一种名为L2R的新路由框架,通过将专家分配过程转移到共享的低维空间并引入平滑的评分机制,有效解决了专家混合模型中路由不稳定和专家分工不明确的问题,从而显著提升了模型性能。
源自 arXiv: 2601.21349