SGD Style
Stochastic gradient descent (SGD) style optimization remains a cornerstone of deep learning, despite the popularity of adaptive optimizers like Adam. Current research focuses on improving SGD's efficiency and robustness in various contexts, including distributed and federated learning, through techniques like gradient compression, asynchronous updates, and novel sampling schemes. These advancements aim to address challenges such as communication bottlenecks in large-scale training, heterogeneous computing environments, and the need for privacy-preserving mechanisms, ultimately leading to faster and more efficient model training across diverse applications.
Papers
May 23, 2024
March 12, 2024
February 7, 2024
December 6, 2023
October 3, 2023
June 15, 2023
June 5, 2023
April 9, 2023
March 29, 2023
February 28, 2023
February 17, 2023
November 2, 2022
June 2, 2022
March 4, 2022