Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is an iterative optimization algorithm used to find the minimum of a function, particularly useful in machine learning for training large models where computing the exact gradient is computationally prohibitive. Current research focuses on improving SGD's efficiency and convergence properties, exploring variations like Adam, incorporating techniques such as momentum, adaptive learning rates, and line search methods, and analyzing its behavior in high-dimensional and non-convex settings. These advancements are crucial for training complex models like deep neural networks and improving the performance of various machine learning applications, impacting fields ranging from natural language processing to healthcare.

507papers

Papers - Page 14

October 31, 2023

Stochastic Gradient Descent for Gaussian Processes Done Right
Stochastic Gradient Descent Variational Gaussian Process Gaussian Process Regression Bayesian Optimisation Gaussian Process

October 29, 2023

Escaping Saddle Points in Heterogeneous Federated Learning via Distributed SGD with Communication Compression
Saddle Point Heterogeneous Federated Learning Stochastic Gradient Descent Communication Compression

October 28, 2023

High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise
High Probability Convergence Convex Cost Stochastic Nonlinear System Heavy Tailed Noise Stochastic Gradient Descent

October 27, 2023

Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent
Heavy Tail Action Free Offline Heavy Tailed Stochastic Gradient Descent Offline Training

October 26, 2023

Benign Oscillation of Stochastic Gradient Descent with Large Learning Rates
Large Learning Rate Generalization Property Non Converging Artificial Oscillation Stochastic Gradient Descent

October 24, 2023

Decentralized Learning over Wireless Networks with Broadcast-Based Subgraph Sampling
Subgraph Sampling Decentralized Stochastic Decentralized Learning Wireless Network Communication Constraint Gradient Descent Stochastic Gradient Descent

October 19, 2023

October 17, 2023

October 16, 2023

AdaLomo: Low-memory Optimization with Adaptive Learning Rate
Memory Efficient Stochastic Gradient Descent Learning Rate Large Language Model Adam Optimizer

October 11, 2023

Why Does Sharpness-Aware Minimization Generalize Better Than SGD?
Sharpness Aware Minimization Two Layer ReLU Strong Generalization Stochastic Gradient Descent Neural Network

October 4, 2023

Spectral alignment of stochastic gradient descent for high-dimensional classification tasks
Stochastic Gradient Descent Gradient Descent Trajectory Gradient Descent

October 3, 2023

October 1, 2023

A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent
Stochastic Gradient Descent Minibatch Stochastic Gradient Descent Subspace Projection Noise Modeling Theoretical Analysis

September 29, 2023

Robust Stochastic Optimization via Gradient Quantile Clipping
Gradient Clipping Stochastic Gradient Descent Stochastic Robust Optimization Heavy Tailed Gradient Descent Gradient Norm Convex Objective

September 26, 2023

SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problem
Stochastic Gradient Descent Minibatch Stochastic Gradient Descent Sample Complexity Ground Truth Two Layer Neural Network XOR Gate

September 21, 2023

Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance
Stochastic Gradient Descent Deep Learning Online Merging Model Merging Neural Network Local Optimum

September 19, 2023

On the different regimes of Stochastic Gradient Descent
Chaotic Regime Stochastic Gradient Descent Gradient Descent Stochastic Way

Stochastic Gradient Descent

Papers - Page 14

Stochastic Gradient Descent for Gaussian Processes Done Right

Escaping Saddle Points in Heterogeneous Federated Learning via Distributed SGD with Communication Compression

High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise

Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent

Benign Oscillation of Stochastic Gradient Descent with Large Learning Rates

Decentralized Learning over Wireless Networks with Broadcast-Based Subgraph Sampling

Demystifying the Myths and Legends of Nonconvex Convergence of SGD

LASER: Linear Compression in Wireless Distributed Optimization

Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression

Resampling Stochastic Gradient Descent Cheaply for Efficient Uncertainty Quantification

AdaLomo: Low-memory Optimization with Adaptive Learning Rate

Why Does Sharpness-Aware Minimization Generalize Better Than SGD?

Spectral alignment of stochastic gradient descent for high-dimensional classification tasks

On the Parallel Complexity of Multilevel Monte Carlo in Stochastic Gradient Descent

Stochastic Gradient Descent with Preconditioned Polyak Step-size

A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent

Robust Stochastic Optimization via Gradient Quantile Clipping

SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problem

Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance

On the different regimes of Stochastic Gradient Descent