Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is an iterative optimization algorithm used to find the minimum of a function, particularly useful in machine learning for training large models where computing the exact gradient is computationally prohibitive. Current research focuses on improving SGD's efficiency and convergence properties, exploring variations like Adam, incorporating techniques such as momentum, adaptive learning rates, and line search methods, and analyzing its behavior in high-dimensional and non-convex settings. These advancements are crucial for training complex models like deep neural networks and improving the performance of various machine learning applications, impacting fields ranging from natural language processing to healthcare.
Papers
Stochastic Gradient Descent without Full Data Shuffle
Lijie Xu, Shuang Qiu, Binhang Yuan, Jiawei Jiang, Cedric Renggli, Shaoduo Gan, Kaan Kara, Guoliang Li, Ji Liu, Wentao Wu, Jieping Ye, Ce Zhang
Characterizing the Implicit Bias of Regularized SGD in Rank Minimization
Tomer Galanti, Zachary S. Siegel, Aparna Gupte, Tomaso Poggio
Few-Shot Learning by Dimensionality Reduction in Gradient Space
Martin Gauch, Maximilian Beck, Thomas Adler, Dmytro Kotsur, Stefan Fiel, Hamid Eghbal-zadeh, Johannes Brandstetter, Johannes Kofler, Markus Holzleitner, Werner Zellinger, Daniel Klotz, Sepp Hochreiter, Sebastian Lehner
Generalization Error Bounds for Deep Neural Networks Trained by SGD
Mingze Wang, Chao Ma
Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares
Anant Raj, Melih Barsbey, Mert Gürbüzbalaban, Lingjiong Zhu, Umut Şimşekli
Stochastic gradient descent introduces an effective landscape-dependent regularization favoring flat solutions
Ning Yang, Chao Tang, Yuhai Tu
Trajectory of Mini-Batch Momentum: Batch Size Saturation and Convergence in High Dimensions
Kiwon Lee, Andrew N. Cheng, Courtney Paquette, Elliot Paquette