Local SGD
Local SGD (stochastic gradient descent) is a distributed optimization algorithm designed to efficiently train machine learning models across multiple devices or clusters, reducing communication overhead compared to traditional centralized methods. Current research focuses on understanding its convergence properties under various data heterogeneity assumptions, exploring its application in large language model training and other complex architectures, and developing strategies to mitigate challenges like stragglers and asynchronous updates. These advancements are significant for scaling machine learning to massive datasets and diverse computing environments, improving both training speed and model performance.
Papers
September 26, 2024
September 20, 2024
May 19, 2024
March 15, 2024
January 17, 2024
December 2, 2023
November 1, 2023
October 22, 2023
October 2, 2023
May 6, 2023
April 9, 2023
March 2, 2023
December 24, 2022
October 20, 2022
October 6, 2022
September 25, 2022
May 26, 2022
April 26, 2022