Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is an iterative optimization algorithm used to find the minimum of a function, particularly useful in machine learning for training large models where computing the exact gradient is computationally prohibitive. Current research focuses on improving SGD's efficiency and convergence properties, exploring variations like Adam, incorporating techniques such as momentum, adaptive learning rates, and line search methods, and analyzing its behavior in high-dimensional and non-convex settings. These advancements are crucial for training complex models like deep neural networks and improving the performance of various machine learning applications, impacting fields ranging from natural language processing to healthcare.
Papers
Statistical Learning and Inverse Problems: A Stochastic Gradient Approach
Yuri R. Fonseca, Yuri F. Saporito
NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer
Valentin Leplat, Daniil Merkulov, Aleksandr Katrutsa, Daniel Bershatsky, Olga Tsymboi, Ivan Oseledets
Neural Networks Efficiently Learn Low-Dimensional Representations with SGD
Alireza Mousavi-Hosseini, Sejun Park, Manuela Girotti, Ioannis Mitliagkas, Murat A. Erdogdu
Computational Complexity of Sub-Linear Convergent Algorithms
Hilal AlQuabeh, Farha AlBreiki, Dilshod Azizov