Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is an iterative optimization algorithm used to find the minimum of a function, particularly useful in machine learning for training large models where computing the exact gradient is computationally prohibitive. Current research focuses on improving SGD's efficiency and convergence properties, exploring variations like Adam, incorporating techniques such as momentum, adaptive learning rates, and line search methods, and analyzing its behavior in high-dimensional and non-convex settings. These advancements are crucial for training complex models like deep neural networks and improving the performance of various machine learning applications, impacting fields ranging from natural language processing to healthcare.
Papers
Weight fluctuations in (deep) linear neural networks and a derivation of the inverse-variance flatness relation
Markus Gross, Arne P. Raulf, Christoph Räth
DPSUR: Accelerating Differentially Private Stochastic Gradient Descent Using Selective Update and Release
Jie Fu, Qingqing Ye, Haibo Hu, Zhili Chen, Lulu Wang, Kuncan Wang, Xun Ran
Bandit-Driven Batch Selection for Robust Learning under Label Noise
Michal Lisicki, Mihai Nica, Graham W. Taylor
Stochastic Gradient Descent for Gaussian Processes Done Right
Jihao Andreas Lin, Shreyas Padhy, Javier Antorán, Austin Tripp, Alexander Terenin, Csaba Szepesvári, José Miguel Hernández-Lobato, David Janz