SGD Step
Stochastic Gradient Descent (SGD) steps are fundamental to training large machine learning models, aiming to efficiently find model parameters minimizing a loss function. Current research focuses on improving SGD's efficiency and robustness, particularly through variance reduction techniques, adaptive step size strategies (including cyclic and randomized approaches), and asynchronous or federated implementations for distributed training. These advancements address challenges like memory limitations in large language model fine-tuning, hyperparameter sensitivity, and communication overhead in federated learning, ultimately leading to faster and more scalable model training across diverse applications.
Papers
April 11, 2024
February 12, 2024
January 17, 2024
September 28, 2023
September 20, 2023
August 4, 2023
May 16, 2023
February 10, 2023
November 9, 2022
May 5, 2022
April 26, 2022