Masking Rate

Masking rate, the percentage of input data masked during model training, is a crucial hyperparameter influencing the performance of various machine learning models, particularly in masked language modeling and multi-modal learning. Recent research explores optimal masking rates, finding that the previously standard 15% is not universally optimal, with larger models often benefiting from significantly higher rates, and even dynamic scheduling of the masking rate throughout training. This research impacts model efficiency and performance across diverse applications, including natural language processing, autonomous driving, and speech enhancement, by improving model robustness and potentially accelerating training.

Papers