Masking Rate
Masking rate, the percentage of input data masked during model training, is a crucial hyperparameter influencing the performance of various machine learning models, particularly in masked language modeling and multi-modal learning. Recent research explores optimal masking rates, finding that the previously standard 15% is not universally optimal, with larger models often benefiting from significantly higher rates, and even dynamic scheduling of the masking rate throughout training. This research impacts model efficiency and performance across diverse applications, including natural language processing, autonomous driving, and speech enhancement, by improving model robustness and potentially accelerating training.
Papers
May 13, 2024
May 24, 2023
February 23, 2023
December 10, 2022
October 27, 2022