Clipped Stochastic Gradient Descent

Clipped stochastic gradient descent (SGD) is a technique used in machine learning to improve the robustness and privacy of training algorithms, particularly when dealing with noisy or heavy-tailed data distributions. Current research focuses on analyzing its convergence properties in various settings, including non-convex optimization and differentially private training, often employing adaptive step-size methods like AdaGrad and Adam, or exploring alternatives like SignSGD. This work is significant because it addresses challenges in training large models, enhances privacy guarantees, and improves the reliability of optimization algorithms across diverse applications, from natural language processing to computer vision.

Papers