Masked Autoencoders
Masked autoencoders (MAEs) are a self-supervised learning technique that learns robust image representations by reconstructing masked portions of an image. Current research focuses on adapting MAEs for various data modalities (images, point clouds, audio, 3D data) and downstream tasks (classification, segmentation, object detection), often incorporating architectural enhancements like Vision Transformers and exploring different masking strategies beyond random masking to improve efficiency and performance. The resulting pre-trained models offer significant advantages in scenarios with limited labeled data, impacting fields like Earth observation, medical image analysis, and robotics through improved accuracy and reduced computational demands.
Papers
SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners
Feng Liang, Yangguang Li, Diana Marculescu
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training
Renrui Zhang, Ziyu Guo, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, Hongsheng Li, Peng Gao
Object-wise Masked Autoencoders for Fast Pre-training
Jiantao Wu, Shentong Mo