Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is an iterative optimization algorithm used to find the minimum of a function, particularly useful in machine learning for training large models where computing the exact gradient is computationally prohibitive. Current research focuses on improving SGD's efficiency and convergence properties, exploring variations like Adam, incorporating techniques such as momentum, adaptive learning rates, and line search methods, and analyzing its behavior in high-dimensional and non-convex settings. These advancements are crucial for training complex models like deep neural networks and improving the performance of various machine learning applications, impacting fields ranging from natural language processing to healthcare.
Papers
Instance-Dependent Generalization Bounds via Optimal Transport
Songyan Hou, Parnian Kassraie, Anastasis Kratsios, Andreas Krause, Jonas Rothfuss
Accelerating Parallel Stochastic Gradient Descent via Non-blocking Mini-batches
Haoze He, Parijat Dube
RCD-SGD: Resource-Constrained Distributed SGD in Heterogeneous Environment via Submodular Partitioning
Haoze He, Parijat Dube
Rigorous dynamical mean field theory for stochastic gradient descent methods
Cedric Gerbelot, Emanuele Troiani, Francesca Mignacco, Florent Krzakala, Lenka Zdeborova
AdaNorm: Adaptive Gradient Norm Correction based Optimizer for CNNs
Shiv Ram Dubey, Satish Kumar Singh, Bidyut Baran Chaudhuri
Momentum Aggregation for Private Non-convex ERM
Hoang Tran, Ashok Cutkosky