少即是专家:裁减领域专用语言模型中的专家模块 / Less is MoE: Trimming Experts in Domain-Specialist Language Models
1️⃣ 一句话总结
本文发现混合专家模型的关键能力集中在极少量中间维度上,提出用Fisher重要性指标精准定位并裁剪这些维度,在保留模型性能的同时,大幅压缩模型体积并提升推理速度。
Mixture-of-Experts (MoE) models achieve strong performance through conditional computation, but their large parameter footprint poses deployment challenges. Prior MoE compression approaches catastrophically fail when evaluated on general-purpose benchmarks beyond commonsense reasoning. We trace this failure to the granularity of compression: important capabilities are distributed across experts but concentrated in FFN sparse intermediate dimensions. To identify these dimensions, we use Fisher importance which outperforms activation-, router-score-, and magnitude-based alternatives, and identifies tiny sets of task-critical dimensions: in Qwen1.5-MoE, removing as few as 12 of 1.35M routed-FFN intermediate dimensions collapses GSM8K accuracy while largely preserving factual-knowledge performance. Building on this, we propose Fisher-MoE, which operates within FFN to remove intermediate dimensions ranked by Fisher importance. At the same 50% MoE compression ratio, Fisher-MoE preserves model capability, while reducing weight memory by ~45% and improving inference throughput by 21%. These findings suggest intermediate dimension granularity is an effective unit for both compression and ranking where capability concentrates in MoE models.
少即是专家:裁减领域专用语言模型中的专家模块 / Less is MoE: Trimming Experts in Domain-Specialist Language Models
本文发现混合专家模型的关键能力集中在极少量中间维度上,提出用Fisher重要性指标精准定位并裁剪这些维度,在保留模型性能的同时,大幅压缩模型体积并提升推理速度。
源自 arXiv: 2606.05538