Training Pipeline
Training pipelines for machine learning models, particularly large language models and neural networks, are being actively optimized to improve efficiency and performance. Current research focuses on mitigating bottlenecks like data loading, asynchronous parallelism (e.g., using 1F1B schedules and weight prediction), and efficient resource utilization across heterogeneous hardware (GPUs, CPUs, SSDs). These advancements aim to reduce training time and costs, enabling the development and deployment of larger, more accurate models across various applications, from natural language processing to computer vision and robotics.
Papers
November 6, 2024
September 11, 2024
July 9, 2024
February 28, 2024
February 23, 2024
February 20, 2024
February 6, 2024
January 29, 2024
January 3, 2024
December 11, 2023
December 1, 2023
September 28, 2023
August 11, 2023
June 28, 2023
April 11, 2023
December 14, 2022
August 19, 2022
August 18, 2022
April 22, 2022