Transfer Hyperparameter Optimisation

Transfer hyperparameter optimization aims to leverage hyperparameters found optimal for smaller or simpler models to efficiently train larger, more complex ones, reducing the substantial computational cost of exhaustive hyperparameter searches. Current research focuses on identifying scaling laws for hyperparameters across varying model sizes, depths, dataset sizes, and different optimizers and parameterizations, particularly within large language models and residual networks. This research is crucial for accelerating the development and deployment of increasingly sophisticated deep learning models, enabling more efficient training and potentially improving model performance.

Papers