Sparsely Gated Mixture of Expert

Sparsely-gated Mixture-of-Experts (MoE) models aim to improve the efficiency and scalability of large neural networks by dividing the computational workload among specialized "expert" sub-networks, activated only when needed. Current research focuses on optimizing MoE architectures for various tasks, including natural language processing, computer vision, and speech recognition, exploring techniques like shortcut connections for faster training and novel adversarial training methods for improved robustness. This approach offers significant potential for reducing computational costs and improving performance in large-scale machine learning applications, particularly for resource-constrained environments and tasks requiring high accuracy.

Papers

April 11, 2024

JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Yikang Shen, Zhen Guo, Tianle Cai, Zengyi Qin
Large Language Model Inference Framework Tuned Llama Model Sparsely Gated Mixture of Expert

April 7, 2024

Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai, Juyong Jiang, Le Qin, Junwei Cui, Sunghun Kim, Jiayi Huang
Expert Parallelism Mixture of Expert Architecture Sparsely Gated Mixture of Expert

August 19, 2023

Robust Mixture-of-Expert Training for Convolutional Neural Networks
Yihua Zhang, Ruisi Cai, Tianlong Chen, Guanhua Zhang, Huan Zhang, Pin-Yu Chen, Shiyu Chang, Zhangyang Wang, Sijia Liu
Convolutional Neural Network Adversarial Robustness Mixture of Expert Sparsely Gated Mixture of Expert

March 13, 2023

Scaling Vision-Language Models with Sparse Mixture of Experts
Sheng Shen, Zhewei Yao, Chunyuan Li, Trevor Darrell, Kurt Keutzer, Yuxiong He
Vision Language Model Expert Knowledge Multimodal Machine Learning Dense Model Sparse Mixture Sparsely Gated Mixture of Expert

January 6, 2023

AdaEnsemble: Learning Adaptively Sparse Structured Ensemble Network for Click-Through Rate Prediction
YaChen Yan, Liubo Li
Click Through Rate Prediction Feature Interaction Sparsely Gated Mixture of Expert

November 11, 2022

Handling Trade-Offs in Speech Separation with Sparsely-Gated Mixture of Experts
Xiaofei Wang, Zhuo Chen, Yu Shi, Jian Wu, Naoyuki Kanda, Takuya Yoshioka
Automatic Speech Recognition Expert Knowledge Speech Separation Monaural Speech Separation Overlapped Speech Detection Sparsely Gated Mixture of Expert Gated Mixture

September 17, 2022

Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition
Ye Bai, Jie Li, Wenjing Han, Hao Ni, Kaituo Xu, Zhuo Zhang, Cheng Yi, Xiaorui Wang
Knowledge Distillation Speech Recognition End to End Expert Knowledge Conformer Generation Sparsely Gated Mixture of Expert

April 22, 2022

Sparsely-gated Mixture-of-Expert Layers for CNN Interpretability
Svetlana Pavlitska, Christian Hubschneider, Lukas Struppek, J. Marius Zöllner
Object Detection CNN Model COCO Dataset Sparsely Gated Mixture of Expert Expert Aggregation

December 10, 2021

Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition
Kenichi Kumatani, Robert Gmyr, Felipe Cruz Salinas, Linquan Liu, Wei Zuo, Devang Patel, Eric Sun, Yu Shi
Speech Recognition Expert Knowledge Multilingual Automatic Speech Recognition Sequence Transformer Sparsely Gated Mixture of Expert Gated Mixture