菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-21
📄 Abstract - Systematic Exploration of 4-Expert Heterogeneous Mixture-of-Experts via Automated Pipeline Search

We present an automated large-scale search pipeline for heterogeneous 4-Expert Mixture-of-Experts (MoE4) architectures within the LEMUR neural network dataset ecosystem. Building on a hand-crafted heterogeneous MoE reference model, we replace manual design with a deterministic code-assembly generator that systematically combines base architecture families drawn from the LEMUR database into MoE4 ensembles, each governed by a convolutional gating network with temperature scaling, mixup augmentation, and cosine-annealed learning rate scheduling. Over a 28-day campaign on an NVIDIA RTX 4090, the pipeline generated 4,463 candidate models across 197 batches, of which 1,021 were evaluated successfully. A critical finding emerged from the campaign: due to alphabetical enumeration via this http URL, the entire explored search space (4.8% of the theoretical 23,751 possible 4-family combinations) is anchored to a single family, AirNet. We characterise this coverage bias precisely, identify the root cause in the generator, and propose a stratified random sampling fix. Within the AirNet anchored scope, ShuffleNet and MobileNetV3 consistently co-produce the highest-accuracy ensembles (mean accuracy up to 0.632), while FractalNet and MNASNet are identified as low-yield families warranting exclusion in future campaigns. The pipeline, analysis artefacts, and corrected generator are released as part of the open-source NNGPT project at this https URL

顶级标签: machine learning model training
详细标签: mixture of experts heterogeneous architecture pipeline search network architecture search lemur dataset 或 搜索:

基于自动化流水线搜索的4专家异构混合专家模型系统探索 / Systematic Exploration of 4-Expert Heterogeneous Mixture-of-Experts via Automated Pipeline Search


1️⃣ 一句话总结

本文提出了一种自动流水线,在LEMUR神经网络数据集上系统搜索由4个不同专家组成的混合专家模型(MoE4),通过生成代码组合和自动训练评估,发现大部分搜索局限于AirNet架构家族,并指出ShuffleNet与MobileNetV3组合可获得最佳准确率(平均约0.632),而FractalNet和MNASNet效果较差,建议未来排除。

源自 arXiv: 2606.23739