Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is an iterative optimization algorithm used to find the minimum of a function, particularly useful in machine learning for training large models where computing the exact gradient is computationally prohibitive. Current research focuses on improving SGD's efficiency and convergence properties, exploring variations like Adam, incorporating techniques such as momentum, adaptive learning rates, and line search methods, and analyzing its behavior in high-dimensional and non-convex settings. These advancements are crucial for training complex models like deep neural networks and improving the performance of various machine learning applications, impacting fields ranging from natural language processing to healthcare.
Papers
Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence
Yichuan Deng, Zhao Song, Chiwun Yang
Emergence of heavy tails in homogenized stochastic gradient descent
Zhe Jiao, Martin Keller-Ressel
Training-time Neuron Alignment through Permutation Subspace for Improving Linear Mode Connectivity and Model Fusion
Zexi Li, Zhiqi Li, Jie Lin, Tao Shen, Tao Lin, Chao Wu
Improved Quantization Strategies for Managing Heavy-tailed Gradients in Distributed Learning
Guangfeng Yan, Tan Li, Yuanzhang Xiao, Hanxu Hou, Linqi Song
Truncated Non-Uniform Quantization for Distributed SGD
Guangfeng Yan, Tan Li, Yuanzhang Xiao, Congduan Li, Linqi Song