Masked Autoencoders
Masked autoencoders (MAEs) are a self-supervised learning technique that learns robust image representations by reconstructing masked portions of an image. Current research focuses on adapting MAEs for various data modalities (images, point clouds, audio, 3D data) and downstream tasks (classification, segmentation, object detection), often incorporating architectural enhancements like Vision Transformers and exploring different masking strategies beyond random masking to improve efficiency and performance. The resulting pre-trained models offer significant advantages in scenarios with limited labeled data, impacting fields like Earth observation, medical image analysis, and robotics through improved accuracy and reduced computational demands.
Papers
R-MAE: Regions Meet Masked Autoencoders
Duy-Kien Nguyen, Vaibhav Aggarwal, Yanghao Li, Martin R. Oswald, Alexander Kirillov, Cees G. M. Snoek, Xinlei Chen
Understanding Masked Autoencoders via Hierarchical Latent Variable Models
Lingjing Kong, Martin Q. Ma, Guangyi Chen, Eric P. Xing, Yuejie Chi, Louis-Philippe Morency, Kun Zhang