Transfer Hyperparameter Optimisation
Transfer hyperparameter optimization aims to leverage hyperparameters found optimal for smaller or simpler models to efficiently train larger, more complex ones, reducing the substantial computational cost of exhaustive hyperparameter searches. Current research focuses on identifying scaling laws for hyperparameters across varying model sizes, depths, dataset sizes, and different optimizers and parameterizations, particularly within large language models and residual networks. This research is crucial for accelerating the development and deployment of increasingly sophisticated deep learning models, enabling more efficient training and potentially improving model performance.
Papers
September 30, 2024
July 8, 2024
October 3, 2023
September 28, 2023
June 29, 2023
April 7, 2023
February 15, 2022