Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is an iterative optimization algorithm used to find the minimum of a function, particularly useful in machine learning for training large models where computing the exact gradient is computationally prohibitive. Current research focuses on improving SGD's efficiency and convergence properties, exploring variations like Adam, incorporating techniques such as momentum, adaptive learning rates, and line search methods, and analyzing its behavior in high-dimensional and non-convex settings. These advancements are crucial for training complex models like deep neural networks and improving the performance of various machine learning applications, impacting fields ranging from natural language processing to healthcare.
Papers
The Optimization Landscape of SGD Across the Feature Learning Strength
Alexander Atanasov, Alexandru Meterez, James B. Simon, Cengiz Pehlevan
Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets
Tianjian Li, Haoran Xu, Weiting Tan, Dongwei Jiang, Kenton Murray, Daniel Khashabi
A Comprehensive Framework for Analyzing the Convergence of Adam: Bridging the Gap with SGD
Ruinan Jin, Xiao Li, Yaoliang Yu, Baoxiang Wang
Estimating Generalization Performance Along the Trajectory of Proximal SGD in Robust Regression
Kai Tan, Pierre C. Bellec
Learning K-U-Net with constant complexity: An Application to time series forecasting
Jiang You, Arben Cela, René Natowicz, Jacob Ouanounou, Patrick Siarry
Towards Better Generalization: Weight Decay Induces Low-rank Bias for Neural Networks
Ke Chen, Chugang Yi, Haizhao Yang
Universality in Transfer Learning for Linear Models
Reza Ghane, Danil Akhtiamov, Babak Hassibi
Review Non-convex Optimization Method for Machine Learning
Greg B Fotopoulos, Paul Popovich, Nicholas Hall Papadopoulos
Truncated Kernel Stochastic Gradient Descent on Spheres
JinHui Bai, Lei Shi
On the SAGA algorithm with decreasing step
Luis Fredes (IMB), Bernard Bercu (IMB), Eméric Gbaguidi (IMB)
Stochastic Gradient Descent with Adaptive Data
Ethan Che, Jing Dong, Xin T. Tong