Mixture of Expert Architecture

Mixture-of-Experts (MoE) architectures aim to improve the efficiency and adaptability of large language models by distributing computation across specialized expert networks. Current research focuses on enhancing routing mechanisms for efficient expert selection, developing methods for adding or adapting experts to new tasks with minimal retraining, and exploring the application of MoEs in diverse areas like summarization, robotics, and secure machine learning. This approach offers a parameter-efficient alternative to simply scaling up model size, leading to improved performance and resource utilization in various applications.

Papers

December 19, 2024

ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
Ziteng Wang, Jianfei Chen, Jun Zhu
Domain Specific Mixture of Expert Potential Scalability Mixture of Expert Architecture

October 14, 2024

Scalable Multi-Domain Adaptation of Language Models using Modular Experts
Peter Schafhalter, Shun Liao, Yanqi Zhou, Chih-Kuan Yeh, Arun Kandoor, James Laudon
Language Model Domain Adaptation Pre Trained Language Model Expert Knowledge Modular System Domain Expertise Mixture of Expert Architecture

August 28, 2024

Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts
Nikolas Gritsch, Qizhen Zhang, Acyr Locatelli, Sara Hooker, Ahmet Üstün
Adaptive Importance Expert Knowledge Mixture of Expert Hybrid Fusion Recent Large Language Model Sparse Expert Expert Network Mixture of Expert Architecture

June 8, 2024

Flexible and Adaptable Summarization via Expertise Separation
Xiuying Chen, Mingzhe Li, Shen Gao, Xin Cheng, Qingqing Zhu, Rui Yan, Xin Gao, Xiangliang Zhang
Extension Study Summarization Task Summarization Model Domain Expert Summarization Domain Mixture of Expert Architecture

May 28, 2024

Yuan 2.0-M32: Mixture of Experts with Attention Router
Shaohua Wu, Jiangang Luo, Xi Chen, Lingjun Li, Xudong Zhao, Tong Yu, Chao Wang, Yue Wang, Fei Wang, Weixu Qiao, Houbo He, Zeru Zhang, Zeyu Sun, Junxiong Mao, Chong Shen
Mixture Component Expert Knowledge Random Network Training Compute Mixture of Expert Architecture Box Attention

April 7, 2024

Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai, Juyong Jiang, Le Qin, Junwei Cui, Sunghun Kim, Jiayi Huang
Expert Parallelism Mixture of Expert Architecture Sparsely Gated Mixture of Expert

March 18, 2024

Expert Composer Policy: Scalable Skill Repertoire for Quadruped Robots
Guilherme Christmann, Ying-Sheng Luo, Wei-Chao Chen
Quadruped Robot State Transition Mixture of Expert Architecture Modular Skill

November 17, 2023

Token-Level Adaptation of LoRA Adapters for Downstream Task Generalization
Joshua Belofsky
Language Model Downstream Task Low Rank Adapter Mixture of Expert Architecture

October 24, 2023

Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting Pre-trained Language Models
Raymond Li, Gabriel Murray, Giuseppe Carenini
Pre Trained Language Model Parameter Efficient Fine Tuning Large Relevance Improvement Machine Translated Linguistic Structure Different Pre Trained Mixture of Expert Architecture

June 5, 2023

Information Flow Control in Machine Learning through Modular Model Architecture
Trishita Tiwari, Suchin Gururangan, Chuan Guo, Weizhe Hua, Sanjay Kariyappa, Udit Gupta, Wenjie Xiong, Kiwan Maeng, Hsien-Hsin S. Lee, G. Edward Suh
Large Language Model Machine Learning Sensitive Data Information Flow Modular Approach Mixture of Expert Architecture

November 24, 2022

Double Deep Q-Learning in Opponent Modeling
Yangtianze Tao, John Doe
Multi Agent System Opponent Modeling Diving Deep Double Deep Q Network Mixture of Expert Architecture