Stochastic Polyak

Stochastic Polyak step size (SPS) methods represent a class of adaptive learning rate algorithms for stochastic gradient descent (SGD) and related optimization methods, aiming to improve efficiency and robustness in training machine learning models. Current research focuses on extending SPS to handle momentum, line search techniques (both monotone and non-monotone), and proximal operators for regularization, as well as developing second-order variants for faster convergence. These advancements address limitations of traditional SGD, such as the need for careful hyperparameter tuning, and offer improved convergence guarantees and practical performance across various problem settings, including over-parameterized models and bi-level optimization.

Papers