Gradient Noise

Gradient noise, inherent in stochastic gradient descent (SGD) used to train deep neural networks, is a central focus of current research, with investigations exploring its impact on optimization and generalization. Researchers are examining the statistical properties of this noise, including its distribution (e.g., Gaussian vs. heavy-tailed) and its interaction with various optimization algorithms like SGD with momentum, Adam, and Sharpness-Aware Minimization (SAM), as well as variance reduction techniques. Understanding gradient noise is crucial for improving the efficiency and robustness of deep learning training, particularly in distributed settings like federated learning, and for developing theoretically sound optimization strategies.

Papers