Batch Size
Batch size, a hyperparameter determining the number of samples used in each gradient update during neural network training, significantly impacts model performance and training efficiency. Current research focuses on optimizing batch size schedules, particularly exploring the interplay between batch size and learning rate across various algorithms like stochastic gradient descent (SGD) and Adam, and within different model architectures including language models and reinforcement learning agents. Findings consistently demonstrate that optimal batch size is not universally constant but rather depends on factors such as model architecture, dataset characteristics, and computational resources, influencing both training speed and generalization performance. This research is crucial for improving the scalability and efficiency of training large-scale neural networks across diverse applications.
Papers
Smaller Batches, Bigger Gains? Investigating the Impact of Batch Sizes on Reinforcement Learning Based Real-World Production Scheduling
Arthur Müller, Felix Grumbach, Matthia Sabatelli
Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs
Luca Arnaboldi, Yatin Dandi, Florent Krzakala, Bruno Loureiro, Luca Pesce, Ludovic Stephan