Nesterov Momentum

Nesterov's accelerated gradient (NAG) methods aim to improve the convergence speed of gradient-based optimization algorithms by incorporating momentum, effectively leveraging past gradient information to guide the search towards the optimum. Current research focuses on extending NAG's benefits to complex settings like stochastic optimization, federated learning, and training deep neural networks, often incorporating techniques such as Hessian sketching and adaptive momentum adjustments. These advancements are significant because they enhance the efficiency and scalability of optimization algorithms across various machine learning applications, leading to faster training times and improved model performance. The development of novel algorithms like Adan and MSAM, built upon NAG principles, further demonstrates the ongoing impact of this foundational optimization technique.

Papers