Masked Pre Training

Masked pre-training is a self-supervised learning technique where models learn by predicting masked or missing parts of input data (images, videos, speech, text). Current research focuses on optimizing prediction targets and comparing masked pre-training to autoregressive methods, exploring its effectiveness across diverse model architectures like Vision Transformers and its application to various tasks including image segmentation, video action recognition, and topic modeling. This approach offers a powerful way to leverage unlabeled data for improved performance on downstream tasks, impacting fields ranging from computer vision and natural language processing to medical image analysis and robotics.

Papers