Masked Image Modeling
Masked Image Modeling (MIM) is a self-supervised learning technique for computer vision that trains models to reconstruct masked portions of images, learning robust visual representations from unlabeled data. Current research focuses on improving MIM's efficiency and effectiveness through architectural innovations like hybrid Transformer-CNN models and refined masking strategies (e.g., saliency-based, symmetric, or structured knowledge-guided masking), often incorporating contrastive learning or knowledge distillation. This approach significantly advances self-supervised learning, enabling high-performance on various downstream tasks such as image classification, object detection, and semantic segmentation, particularly in data-scarce domains like remote sensing and medical imaging.
Papers
CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion
Philippe Weinzaepfel, Vincent Leroy, Thomas Lucas, Romain Brégier, Yohann Cabon, Vaibhav Arora, Leonid Antsfeld, Boris Chidlovskii, Gabriela Csurka, Jérôme Revaud
A Unified View of Masked Image Modeling
Zhiliang Peng, Li Dong, Hangbo Bao, Qixiang Ye, Furu Wei