菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-28
📄 Abstract - TF-MoE: Time-Frequency Mixture-of-Experts for Efficient Speech Separation

Recent advances in speech separation (SS) have led to compact front-end models with small parameter sizes, yet their high computational cost remains a major barrier for deployment on edge devices. To address this, we propose TF-MoE, a sparse Mixture-of-Experts (MoE) framework that enhances model capacity with almost no increase in inference cost. Our method introduces dynamic expert specialization in time and frequency dimensions through alternating time-wise and frequency-wise MoE modules, each dynamically selecting experts per frame or mel band. Built upon a mel-band-splitting Conformer backbone, TF-MoE achieves strong performance on SS tasks under low-compute settings. Experimental results demonstrate that TF-MoE consistently improves separation performance under computation cost constraints, outperforming BSRNN by +3.8 dB SDR on Libri2Mix with comparable 4.1 GMACs/s inference cost. This positions TF-MoE as a promising candidate for edge-device deployment.

顶级标签: audio model training
详细标签: speech separation mixture-of-experts time-frequency edge deployment low-compute 或 搜索:

TF-MoE:面向高效语音分离的时频混合专家模型 / TF-MoE: Time-Frequency Mixture-of-Experts for Efficient Speech Separation


1️⃣ 一句话总结

本文提出一种名为TF-MoE的稀疏混合专家框架,通过在时间维度和频率维度交替激活不同专家模块,在几乎不增加计算成本的前提下大幅提升语音分离性能,使模型更适合在手机、耳机等边缘设备上运行。

源自 arXiv: 2606.29575