Mixture of Expert Architecture
Mixture-of-Experts (MoE) architectures aim to improve the efficiency and adaptability of large language models by distributing computation across specialized expert networks. Current research focuses on enhancing routing mechanisms for efficient expert selection, developing methods for adding or adapting experts to new tasks with minimal retraining, and exploring the application of MoEs in diverse areas like summarization, robotics, and secure machine learning. This approach offers a parameter-efficient alternative to simply scaling up model size, leading to improved performance and resource utilization in various applications.
Papers
October 14, 2024
August 28, 2024
June 8, 2024
May 28, 2024
April 7, 2024
March 18, 2024
November 17, 2023
October 24, 2023
June 5, 2023