Residual Momentum
Residual momentum, a technique enhancing optimization algorithms like stochastic gradient descent (SGD), aims to improve convergence speed and generalization performance in various machine learning tasks. Current research focuses on adapting momentum methods for federated learning, addressing challenges in asynchronous settings and non-convex optimization problems, and exploring its impact on model architectures ranging from linear networks to deep neural networks and transformers. These advancements are significant because they lead to more efficient training of large-scale models and improved performance in applications such as image segmentation, natural language processing, and even robotic control.
Papers
Guaranteeing Conservation Laws with Projection in Physics-Informed Neural Networks
Anthony Baez, Wang Zhang, Ziwen Ma, Subhro Das, Lam M. Nguyen, Luca Daniel
Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum
Sarit Khirirat, Abdurakhmon Sadiev, Artem Riabinin, Eduard Gorbunov, Peter Richtárik
Error estimates between SGD with momentum and underdamped Langevin diffusion
Arnaud Guillin (LMBP), Yu Wang, Lihu Xu, Haoran Yang