Gradient Descent
Gradient descent is an iterative optimization algorithm used to find the minimum of a function by repeatedly taking steps proportional to the negative of the gradient. Current research focuses on improving its efficiency and robustness, particularly in high-dimensional spaces and with non-convex functions, exploring variations like stochastic gradient descent, proximal methods, and natural gradient descent, often within the context of deep learning models and other complex architectures. These advancements are crucial for training increasingly complex machine learning models and improving their performance in various applications, from image recognition to scientific simulations. A key area of investigation involves understanding and mitigating issues like vanishing/exploding gradients, overfitting, and the impact of data characteristics on convergence.
Papers
Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning
Jiuqi Wang, Ethan Blaser, Hadi Daneshmand, Shangtong Zhang
Thermodynamic Natural Gradient Descent
Kaelan Donatella, Samuel Duffield, Maxwell Aifer, Denis Melanson, Gavin Crooks, Patrick J. Coles
Almost sure convergence rates of stochastic gradient methods under gradient domination
Simon Weissmann, Sara Klein, Waïss Azizian, Leif Döring
Deep linear networks for regression are implicitly regularized towards flat minima
Pierre Marion, Lénaïc Chizat