Learning Rate

Learning rate, a crucial hyperparameter in training neural networks, dictates the step size during optimization. Current research focuses on developing adaptive learning rate schedules, such as warmup-stable-decay and learning rate path switching, to improve training efficiency and generalization, particularly for large language models and other deep learning architectures. These advancements aim to address challenges like finding optimal learning rates across varying model sizes, datasets, and training durations, ultimately leading to faster convergence and better model performance. The impact extends to various applications, from natural language processing and computer vision to scientific computing and reinforcement learning.

Papers

March 8, 2024

Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets
Lorenzo Brigato, Stavroula Mougiakakou
Strong Generalization Fine Tuning Deep Network Learning Rate External Validation Better Generalization Hyper Tune Weight Decay Training Set Feed Forward

March 1, 2024

Adaptive Learning Rate for Follow-the-Regularized-Leader: Competitive Analysis and Best-of-Both-Worlds
Shinji Ito, Taira Tsuchiya, Junya Honda
Multi Armed Bandit Contextual Bandit Learning Rate Linear Bandit Follow the Regularized Leader Best of Both World Algorithm Competitive Ratio

February 29, 2024

Masks, Signs, And Learning Rate Rewinding
Advait Gadhikar, Rebekka Burkholz
Learning Rate Magnitude Pruning Medical Mask Parameter Learning Cultural Sign Mask Learning

February 27, 2024

February 23, 2024

Iteration and Stochastic First-order Oracle Complexities of Stochastic Gradient Descent using Constant and Decaying Learning Rates
Kento Imaizumi, Hideaki Iiduka
Stochastic Gradient Descent Learning Rate First Order Nonconvex Optimization Stochastic First Order Oracle Constant Measure

February 15, 2024

Tracking Changing Probabilities via Dynamic Learners
Omid Madani
Learning Rate Relative Probability Dynamic Learning Deodata Predictor Efficient Prediction Class Prediction

February 8, 2024

On the Convergence of Zeroth-Order Federated Tuning for Large Language Models
Zhenqing Ling, Daoyuan Chen, Liuyi Yao, Yaliang Li, Ying Shen
Large Language Model Early Stage Convergence Learning Rate Zeroth Order Optimization Federated Tuning

February 5, 2024

February 3, 2024

Probing Critical Learning Dynamics of PLMs for Hate Speech Detection
Sarah Masud, Mohammad Aflah Khan, Vikram Goyal, Md Shad Akhtar, Tanmoy Chakraborty
Language Model Large Corpus Learning Rate Hate Speech Detection Domain Specific Model Pre Trained Language

January 29, 2024

CMA-ES with Learning Rate Adaptation
Masahiro Nomura, Youhei Akimoto, Isao Ono
Learning Rate Covariance Matrix Adaptation Evolution Strategy CMA E

January 21, 2024

Understanding the Generalization Benefits of Late Learning Rate Decay
Yinuo Ren, Chao Ma, Lexing Ying
Learning Rate Better Generalization Large Learning Rate Optimal Generalization Generalization Benefit

December 31, 2023

Effect of Optimizer, Initializer, and Architecture of Hypernetworks on Continual Learning from Demonstration
Sayantan Auddy, Sebastian Bergner, Justus Piater
Continual LEArning Learning Rate Textual Demonstration Attention Hypernetworks Superior Optimizer Trajectory Model Continual Learning Performance

December 7, 2023

Unnatural Algorithms in Machine Learning
Christian Goodbrake
Machine Learning Practical Algorithm Learning Rate Natural Gradient Descent Training Robustness Reparameterization Model

November 27, 2023

November 23, 2023

Locally Optimal Descent for Dynamic Stepsize Scheduling
Gilad Yehudai, Alon Cohen, Amit Daniely, Yoel Drori, Tomer Koren, Mariano Schain
Learning Rate Stochastic Gradient Step Size Local Optimality Smooth Stochastic Convex Optimization Adaptive Stepsizes Learning Rate Scheduler

November 19, 2023

Large Learning Rates Improve Generalization: But How Large Are We Talking About?
Ekaterina Lobacheva, Eduard Pockonechnyy, Maxim Kodryan, Dmitry Vetrov
Neural Network Strong Generalization Learning Rate Large Learning Rate

November 6, 2023

Signal Processing Meets SGD: From Momentum to Filter
Zhipeng Yao, Guiyuan Fu, Ying Li, Yu Zhang, Dazhou Li, Rui Yu
Stochastic Gradient Descent Learning Rate Signal Processing Residual Momentum Slow Convergence