Expert Parallelism
Expert parallelism aims to accelerate the training and inference of large-scale machine learning models, particularly those with Mixture-of-Experts (MoE) architectures, by distributing computation across multiple processing units. Current research focuses on optimizing communication overhead within these parallel systems, exploring techniques like novel scheduling algorithms, adaptive expert placement, and optimized communication patterns to improve efficiency. These advancements are crucial for enabling the training and deployment of increasingly complex models, impacting both the scalability of research in areas like large language models and the performance of real-world applications requiring high-throughput inference.
Papers
November 10, 2024
November 4, 2024
August 22, 2024
July 5, 2024
June 30, 2024
June 24, 2024
April 7, 2024
January 16, 2024
October 6, 2023
October 3, 2023
July 31, 2023
July 17, 2023
July 5, 2023
May 31, 2023
April 22, 2023
March 24, 2023
March 11, 2023
November 25, 2022
October 15, 2022