Mixture of Expert
Mixture-of-Experts (MoE) models aim to improve the efficiency and scalability of large language and other models by using multiple specialized "expert" networks, each handling a subset of the input data. Current research focuses on improving routing algorithms to efficiently assign inputs to experts, developing heterogeneous MoE architectures with experts of varying sizes and capabilities, and optimizing training methods to address challenges like load imbalance and gradient conflicts. This approach holds significant promise for creating larger, more powerful models with reduced computational costs, impacting various fields from natural language processing and computer vision to robotics and scientific discovery.
Papers
December 10, 2022
November 29, 2022
November 26, 2022
November 7, 2022
October 26, 2022
October 17, 2022
October 14, 2022
October 8, 2022
September 4, 2022
August 24, 2022
August 4, 2022
July 8, 2022
June 26, 2022
June 9, 2022
June 7, 2022
May 31, 2022
May 28, 2022
May 25, 2022
May 24, 2022
May 20, 2022