Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is an iterative optimization algorithm used to find the minimum of a function, particularly useful in machine learning for training large models where computing the exact gradient is computationally prohibitive. Current research focuses on improving SGD's efficiency and convergence properties, exploring variations like Adam, incorporating techniques such as momentum, adaptive learning rates, and line search methods, and analyzing its behavior in high-dimensional and non-convex settings. These advancements are crucial for training complex models like deep neural networks and improving the performance of various machine learning applications, impacting fields ranging from natural language processing to healthcare.
Papers
Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Function
Saeed Masiha, Saber Salehkaleybar, Niao He, Negar Kiyavash, Patrick Thiran
Learning from time-dependent streaming data with online stochastic algorithms
Antoine Godichon-Baggioni, Nicklas Werge, Olivier Wintenberger
RankNEAT: Outperforming Stochastic Gradient Search in Preference Learning Tasks
Kosmas Pinitas, Konstantinos Makantasis, Antonios Liapis, Georgios N. Yannakakis
Sign Bit is Enough: A Learning Synchronization Framework for Multi-hop All-reduce with Ultimate Compression
Feijie Wu, Shiqi He, Song Guo, Zhihao Qu, Haozhao Wang, Weihua Zhuang, Jie Zhang