Token Expert Combination
Token expert combination, a key aspect of Mixture-of-Experts (MoE) models, aims to efficiently distribute computational load across multiple specialized "expert" networks within large language models. Current research focuses on improving routing algorithms, such as recurrent routers that leverage information across layers, and developing more stable routing strategies to mitigate training instability and enhance efficiency. These advancements are significant because they enable scaling model size without a proportional increase in computational cost, leading to improved performance in various natural language processing and other tasks.
Papers
October 17, 2024
September 21, 2024
August 13, 2024
December 15, 2023
November 14, 2023
October 24, 2023
July 12, 2023
April 5, 2023