Stochastic Gradient
Stochastic gradient methods are fundamental algorithms for optimizing objective functions, particularly in large-scale machine learning, aiming to efficiently find optimal model parameters by iteratively updating them based on noisy gradient estimates. Current research focuses on improving convergence rates and robustness of these methods, particularly for non-convex functions and in distributed settings, exploring algorithms like Adam, SGHMC, and variance-reduced techniques, as well as addressing challenges posed by heavy-tailed noise and unbounded smoothness. These advancements have significant implications for training complex models like deep neural networks and for accelerating progress in various applications, including natural language processing, computer vision, and reinforcement learning.
Papers
Limit Theorems for Stochastic Gradient Descent with Infinite Variance
Jose Blanchet, Aleksandar Mijatović, Wenhao Yang
Large Deviations and Improved Mean-squared Error Rates of Nonlinear SGD: Heavy-tailed Noise and Power of Symmetry
Aleksandar Armacki, Shuhua Yu, Dragana Bajovic, Dusan Jakovetic, Soummya Kar
Non-asymptotic convergence analysis of the stochastic gradient Hamiltonian Monte Carlo algorithm with discontinuous stochastic gradient with applications to training of ReLU neural networks
Luxu Liang, Ariel Neufeld, Ying Zhang
Decentralized Federated Learning with Gradient Tracking over Time-Varying Directed Networks
Duong Thuy Anh Nguyen, Su Wang, Duong Tung Nguyen, Angelia Nedich, H. Vincent Poor