Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is an iterative optimization algorithm used to find the minimum of a function, particularly useful in machine learning for training large models where computing the exact gradient is computationally prohibitive. Current research focuses on improving SGD's efficiency and convergence properties, exploring variations like Adam, incorporating techniques such as momentum, adaptive learning rates, and line search methods, and analyzing its behavior in high-dimensional and non-convex settings. These advancements are crucial for training complex models like deep neural networks and improving the performance of various machine learning applications, impacting fields ranging from natural language processing to healthcare.
Papers
Geometrically Inspired Kernel Machines for Collaborative Learning Beyond Gradient Descent
Mohit Kumar, Alexander Valentinitsch, Magdalena Fuchs, Mathias Brucker, Juliana Bowles, Adnan Husakovic, Ali Abbas, Bernhard A. Moser
Langevin Dynamics: A Unified Perspective on Optimization via Lyapunov Potentials
August Y. Chen, Ayush Sekhari, Karthik Sridharan
To Clip or not to Clip: the Dynamics of SGD with Gradient Clipping in High-Dimensions
Noah Marshall, Ke Liang Xiao, Atish Agarwala, Elliot Paquette
Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework
Siyuan Yu, Wei Chen, H. Vincent Poor
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
Pierfrancesco Beneventano, Andrea Pinto, Tomaso Poggio