Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is an iterative optimization algorithm used to find the minimum of a function, particularly useful in machine learning for training large models where computing the exact gradient is computationally prohibitive. Current research focuses on improving SGD's efficiency and convergence properties, exploring variations like Adam, incorporating techniques such as momentum, adaptive learning rates, and line search methods, and analyzing its behavior in high-dimensional and non-convex settings. These advancements are crucial for training complex models like deep neural networks and improving the performance of various machine learning applications, impacting fields ranging from natural language processing to healthcare.
Papers
Optimal Rates for $O(1)$-Smooth DP-SCO with a Single Epoch and Large Batches
Christopher A. Choquette-Choo, Arun Ganesh, Abhradeep Thakurta
Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs
Luca Arnaboldi, Yatin Dandi, Florent Krzakala, Bruno Loureiro, Luca Pesce, Ludovic Stephan
Understanding Stochastic Natural Gradient Variational Inference
Kaiwen Wu, Jacob R. Gardner
SGD method for entropy error function with smoothing l0 regularization for neural networks
Trong-Tuan Nguyen, Van-Dat Thang, Nguyen Van Thin, Phuong T. Nguyen
A Hessian-Aware Stochastic Differential Equation for Modelling SGD
Xiang Li, Zebang Shen, Liang Zhang, Niao He
Improving Linear System Solvers for Hyperparameter Optimisation in Iterative Gaussian Processes
Jihao Andreas Lin, Shreyas Padhy, Bruno Mlodozeniec, Javier Antorán, José Miguel Hernández-Lobato
Bias in Motion: Theoretical Insights into the Dynamics of Bias in SGD Training
Anchit Jain, Rozhin Nobahari, Aristide Baratin, Stefano Sarao Mannelli
Adaptive debiased SGD in high-dimensional GLMs with streaming data
Ruijian Han, Lan Luo, Yuanhang Luo, Yuanyuan Lin, Jian Huang