Masked Autoencoders
Masked autoencoders (MAEs) are a self-supervised learning technique that learns robust image representations by reconstructing masked portions of an image. Current research focuses on adapting MAEs for various data modalities (images, point clouds, audio, 3D data) and downstream tasks (classification, segmentation, object detection), often incorporating architectural enhancements like Vision Transformers and exploring different masking strategies beyond random masking to improve efficiency and performance. The resulting pre-trained models offer significant advantages in scenarios with limited labeled data, impacting fields like Earth observation, medical image analysis, and robotics through improved accuracy and reduced computational demands.
Papers
Extending Video Masked Autoencoders to 128 frames
Nitesh Bharadwaj Gundavarapu, Luke Friedman, Raghav Goyal, Chaitra Hegde, Eirikur Agustsson, Sagar M. Waghmare, Mikhail Sirotenko, Ming-Hsuan Yang, Tobias Weyand, Boqing Gong, Leonid Sigal
Intensity-Spatial Dual Masked Autoencoder for Multi-Scale Feature Learning in Chest CT Segmentation
Yuexing Ding, Jun Wang, Hongbing Lyu
Efficient Masked AutoEncoder for Video Object Counting and A Large-Scale Benchmark
Bing Cao, Quanhao Lu, Jiekang Feng, Pengfei Zhu, Qinghua Hu, Qilong Wang