Parallel Training
Parallel training aims to accelerate the computationally intensive process of training large machine learning models by distributing the workload across multiple processors or devices. Current research focuses on optimizing this process for various model architectures, including large language models (LLMs) and convolutional neural networks (CNNs), through techniques like model and data parallelism, along with strategies to mitigate communication bottlenecks and hardware failures. Efficient parallel training is crucial for advancing the capabilities of AI systems, enabling the development and deployment of larger, more powerful models for diverse applications while reducing training time and costs.
Papers
November 6, 2022
October 27, 2022
August 8, 2022
August 5, 2022
July 25, 2022
March 7, 2022
January 31, 2022
January 29, 2022
January 21, 2022
December 19, 2021
December 11, 2021
December 10, 2021
November 10, 2021