Large Batch
Large batch training, a technique involving processing large amounts of data simultaneously during model training, aims to improve efficiency and scalability in machine learning. Current research focuses on mitigating the challenges associated with large batches, such as slower convergence and generalization issues, through algorithmic innovations like adaptive gradient methods (e.g., Adagrad, LARS, LAMB) and variance reduction techniques. These advancements are crucial for training increasingly complex models, such as large language models and deep neural networks used in computer vision, impacting both the speed and cost-effectiveness of model development and deployment across various applications.
Papers
January 30, 2023
November 23, 2022
November 3, 2022
October 20, 2022
September 30, 2022
September 20, 2022
June 17, 2022
June 6, 2022
June 2, 2022
May 10, 2022
May 2, 2022
April 13, 2022
March 2, 2022
February 23, 2022