Training Throughput
Training throughput in deep learning focuses on maximizing the speed and efficiency of model training, primarily aiming to reduce training time and costs. Current research emphasizes optimizing data loading and transfer, mitigating hardware failures (especially in large-scale distributed training using pipeline parallelism), and improving the efficiency of model architectures (like transformers and GNNs) through techniques such as quantization and memory optimization. These advancements are crucial for making deep learning more accessible and cost-effective, enabling faster development and deployment of sophisticated models across various applications.
Papers
September 27, 2024
June 1, 2024
May 22, 2024
March 19, 2024
February 8, 2024
December 12, 2023
September 28, 2023
June 28, 2023
June 5, 2023
May 7, 2023
May 3, 2023
April 4, 2023
November 11, 2022
October 25, 2022
October 19, 2022
August 18, 2022
August 4, 2022
June 28, 2022