Mixture of Expert
Mixture-of-Experts (MoE) models aim to improve the efficiency and scalability of large language and other models by using multiple specialized "expert" networks, each handling a subset of the input data. Current research focuses on improving routing algorithms to efficiently assign inputs to experts, developing heterogeneous MoE architectures with experts of varying sizes and capabilities, and optimizing training methods to address challenges like load imbalance and gradient conflicts. This approach holds significant promise for creating larger, more powerful models with reduced computational costs, impacting various fields from natural language processing and computer vision to robotics and scientific discovery.
Papers
October 29, 2023
October 22, 2023
October 21, 2023
October 15, 2023
October 11, 2023
October 6, 2023
October 4, 2023
October 2, 2023
October 1, 2023
September 28, 2023
September 11, 2023
September 8, 2023
August 30, 2023
August 28, 2023
August 23, 2023
August 22, 2023
August 19, 2023
July 28, 2023
June 7, 2023