Masked Image Modeling
Masked Image Modeling (MIM) is a self-supervised learning technique for computer vision that trains models to reconstruct masked portions of images, learning robust visual representations from unlabeled data. Current research focuses on improving MIM's efficiency and effectiveness through architectural innovations like hybrid Transformer-CNN models and refined masking strategies (e.g., saliency-based, symmetric, or structured knowledge-guided masking), often incorporating contrastive learning or knowledge distillation. This approach significantly advances self-supervised learning, enabling high-performance on various downstream tasks such as image classification, object detection, and semantic segmentation, particularly in data-scarce domains like remote sensing and medical imaging.