Masked Autoencoders
Masked autoencoders (MAEs) are a self-supervised learning technique that learns robust image representations by reconstructing masked portions of an image. Current research focuses on adapting MAEs for various data modalities (images, point clouds, audio, 3D data) and downstream tasks (classification, segmentation, object detection), often incorporating architectural enhancements like Vision Transformers and exploring different masking strategies beyond random masking to improve efficiency and performance. The resulting pre-trained models offer significant advantages in scenarios with limited labeled data, impacting fields like Earth observation, medical image analysis, and robotics through improved accuracy and reduced computational demands.
Papers
A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder
Lixian Zhang, Yi Zhao, Runmin Dong, Jinxiao Zhang, Shuai Yuan, Shilei Cao, Mengxuan Chen, Juepeng Zheng, Weijia Li, Wei Liu, Wayne Zhang, Litong Feng, Haohuan Fu
Sense Less, Generate More: Pre-training LiDAR Perception with Masked Autoencoders for Ultra-Efficient 3D Sensing
Sina Tayebati, Theja Tulabandhula, Amit R. Trivedi
Residual Connections Harm Generative Representation Learning
Xiao Zhang, Ruoxi Jiang, William Gao, Rebecca Willett, Michael Maire
Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology
Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik, Dominique Beaini, Maciej Sypetkowski, Chi Vicky Cheng, Kristen Morse, Maureen Makes, Ben Mabey, Berton Earnshaw