Masked Image

Masked image modeling (MIM) is a self-supervised learning technique that trains computer vision models by reconstructing masked portions of images, leveraging unlabeled data to learn robust feature representations. Current research focuses on improving MIM's efficiency and effectiveness through architectural innovations like incorporating structured knowledge, interactive masking strategies, and multi-modal data fusion, often within transformer or convolutional neural network frameworks. This approach holds significant promise for advancing various computer vision tasks, particularly in domains with limited labeled data, such as medical image analysis and remote sensing, by enabling the pre-training of powerful models.

Papers