Large Batch
Large batch training, a technique involving processing large amounts of data simultaneously during model training, aims to improve efficiency and scalability in machine learning. Current research focuses on mitigating the challenges associated with large batches, such as slower convergence and generalization issues, through algorithmic innovations like adaptive gradient methods (e.g., Adagrad, LARS, LAMB) and variance reduction techniques. These advancements are crucial for training increasingly complex models, such as large language models and deep neural networks used in computer vision, impacting both the speed and cost-effectiveness of model development and deployment across various applications.
Papers
Optimal Rates for $O(1)$-Smooth DP-SCO with a Single Epoch and Large Batches
Christopher A. Choquette-Choo, Arun Ganesh, Abhradeep Thakurta
Smaller Batches, Bigger Gains? Investigating the Impact of Batch Sizes on Reinforcement Learning Based Real-World Production Scheduling
Arthur Müller, Felix Grumbach, Matthia Sabatelli