Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is an iterative optimization algorithm used to find the minimum of a function, particularly useful in machine learning for training large models where computing the exact gradient is computationally prohibitive. Current research focuses on improving SGD's efficiency and convergence properties, exploring variations like Adam, incorporating techniques such as momentum, adaptive learning rates, and line search methods, and analyzing its behavior in high-dimensional and non-convex settings. These advancements are crucial for training complex models like deep neural networks and improving the performance of various machine learning applications, impacting fields ranging from natural language processing to healthcare.
Papers
Improved Stability and Generalization Guarantees of the Decentralized SGD Algorithm
Batiste Le Bars, Aurélien Bellet, Marc Tommasi, Kevin Scaman, Giovanni Neglia
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
Tongtian Zhu, Fengxiang He, Kaixuan Chen, Mingli Song, Dacheng Tao
Aiming towards the minimizers: fast convergence of SGD for overparametrized problems
Chaoyue Liu, Dmitriy Drusvyatskiy, Mikhail Belkin, Damek Davis, Yi-An Ma
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model
Zirui Liu, Guanchu Wang, Shaochen Zhong, Zhaozhuo Xu, Daochen Zha, Ruixiang Tang, Zhimeng Jiang, Kaixiong Zhou, Vipin Chaudhary, Shuai Xu, Xia Hu
Local SGD Accelerates Convergence by Exploiting Second Order Information of the Loss Function
Linxuan Pan, Shenghui Song
Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima Flatness, and Transfer Learning
Andrei Kucharavy, Rachid Guerraoui, Ljiljana Dolamic
GraVAC: Adaptive Compression for Communication-Efficient Distributed DL Training
Sahil Tyagi, Martin Swany
Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent
Lingjiong Zhu, Mert Gurbuzbalaban, Anant Raj, Umut Simsekli