Parallel Training
Parallel training aims to accelerate the computationally intensive process of training large machine learning models by distributing the workload across multiple processors or devices. Current research focuses on optimizing this process for various model architectures, including large language models (LLMs) and convolutional neural networks (CNNs), through techniques like model and data parallelism, along with strategies to mitigate communication bottlenecks and hardware failures. Efficient parallel training is crucial for advancing the capabilities of AI systems, enabling the development and deployment of larger, more powerful models for diverse applications while reducing training time and costs.
Papers
December 4, 2023
November 27, 2023
November 22, 2023
November 8, 2023
November 1, 2023
October 30, 2023
October 24, 2023
October 23, 2023
October 22, 2023
August 4, 2023
July 26, 2023
July 5, 2023
May 22, 2023
March 24, 2023
February 13, 2023
February 11, 2023
February 10, 2023
January 6, 2023
December 9, 2022