Full Matrix AdaGrad
Full-matrix AdaGrad is an adaptive optimization algorithm aiming to improve the efficiency and stability of training deep learning models by dynamically adjusting learning rates based on the full gradient information. Current research focuses on addressing challenges like heavy-tailed noise in gradients, developing parameter-free variants to eliminate hyperparameter tuning, and improving computational efficiency through techniques such as Kronecker approximations and gradient compression. These advancements are significant because they enhance the robustness and scalability of training large-scale models, impacting both theoretical understanding of optimization and practical applications in diverse fields.
Papers
November 14, 2024
June 6, 2024
May 7, 2024
March 11, 2024
March 5, 2024
September 12, 2023
June 9, 2023
May 30, 2023
April 30, 2023
September 29, 2022
June 14, 2022
May 11, 2022
February 11, 2022