Gradient Descent
Gradient descent is an iterative optimization algorithm used to find the minimum of a function by repeatedly taking steps proportional to the negative of the gradient. Current research focuses on improving its efficiency and robustness, particularly in high-dimensional spaces and with non-convex functions, exploring variations like stochastic gradient descent, proximal methods, and natural gradient descent, often within the context of deep learning models and other complex architectures. These advancements are crucial for training increasingly complex machine learning models and improving their performance in various applications, from image recognition to scientific simulations. A key area of investigation involves understanding and mitigating issues like vanishing/exploding gradients, overfitting, and the impact of data characteristics on convergence.
Papers
State-space models can learn in-context by gradient descent
Neeraj Mohan Sushma, Yudou Tian, Harshvardhan Mestha, Nicolo Colombo, David Kappel, Anand Subramoney
Bypassing the Exponential Dependency: Looped Transformers Efficiently Learn In-context by Multi-step Gradient Descent
Bo Chen, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods
Hossein Taheri, Christos Thrampoulidis, Arya Mazumdar
Phase retrieval: Global convergence of gradient descent with optimal sample complexity
Théodore Fougereux, Cédric Josz, Xiaopeng Li
Stability and Sharper Risk Bounds with Convergence Rate $O(1/n^2)$
Bowei Zhu, Shaojie Li, Yong Liu
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Khashayar Gatmiry, Nikunj Saunshi, Sashank J. Reddi, Stefanie Jegelka, Sanjiv Kumar
Optimal Transportation by Orthogonal Coupling Dynamics
Mohsen Sadr, Peyman Mohajerin Esfehani, Hossein Gorji
On the Convergence of (Stochastic) Gradient Descent for Kolmogorov--Arnold Networks
Yihang Gao, Vincent Y. F. Tan