Sparse Mixture of Expert
Sparse Mixture-of-Experts (SMoE) models aim to improve the efficiency and scalability of large neural networks by activating only a subset of their parameters (experts) for each input. Current research focuses on developing more efficient routing mechanisms (e.g., cosine routers, product key techniques), exploring novel architectures like Block Tensor-Train MoE (BTT-MoE), and optimizing expert pruning strategies (e.g., evolutionary algorithms) to reduce computational costs and improve performance. This approach holds significant promise for deploying extremely large models on resource-constrained devices and improving the performance-compute trade-off in various applications, including natural language processing and computer vision.
Papers
Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation
Nadezhda Chirkova, Vassilina Nikoulina, Jean-Luc Meunier, Alexandre Bérard
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang