Masked Image Modeling
Masked Image Modeling (MIM) is a self-supervised learning technique for computer vision that trains models to reconstruct masked portions of images, learning robust visual representations from unlabeled data. Current research focuses on improving MIM's efficiency and effectiveness through architectural innovations like hybrid Transformer-CNN models and refined masking strategies (e.g., saliency-based, symmetric, or structured knowledge-guided masking), often incorporating contrastive learning or knowledge distillation. This approach significantly advances self-supervised learning, enabling high-performance on various downstream tasks such as image classification, object detection, and semantic segmentation, particularly in data-scarce domains like remote sensing and medical imaging.
Papers
Masked Image Modeling as a Framework for Self-Supervised Learning across Eye Movements
Robin Weiler, Matthias Brucklacher, Cyriel M. A. Pennartz, Sander M. Bohté
Emerging Property of Masked Token for Effective Pre-training
Hyesong Choi, Hunsang Lee, Seyoung Joung, Hyejin Park, Jiyeong Kim, Dongbo Min
Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced Pre-training
Hyesong Choi, Hyejin Park, Kwang Moo Yi, Sungmin Cha, Dongbo Min