Mixture of Expert
Mixture-of-Experts (MoE) models aim to improve the efficiency and scalability of large language and other models by using multiple specialized "expert" networks, each handling a subset of the input data. Current research focuses on improving routing algorithms to efficiently assign inputs to experts, developing heterogeneous MoE architectures with experts of varying sizes and capabilities, and optimizing training methods to address challenges like load imbalance and gradient conflicts. This approach holds significant promise for creating larger, more powerful models with reduced computational costs, impacting various fields from natural language processing and computer vision to robotics and scientific discovery.
Papers
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Johan Obando-Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro
Higher Layers Need More LoRA Experts
Chongyang Gao, Kezhen Chen, Jinmeng Rao, Baochen Sun, Ruibo Liu, Daiyi Peng, Yawen Zhang, Xiaoyuan Guo, Jie Yang, VS Subrahmanian
Approximation Rates and VC-Dimension Bounds for (P)ReLU MLP Mixture of Experts
Anastasis Kratsios, Haitz Sáez de Ocáriz Borde, Takashi Furuya, Marc T. Law
FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion
Xing Han, Huy Nguyen, Carl Harris, Nhat Ho, Suchi Saria
On Least Square Estimation in Softmax Gating Mixture of Experts
Huy Nguyen, Nhat Ho, Alessandro Rinaldo