Masked Auto Encoder
Masked Autoencoders (MAEs) are a self-supervised learning technique that reconstructs masked portions of input data, learning robust and generalizable representations without relying on labeled datasets. Current research focuses on extending MAE's application beyond image data to diverse modalities like video, point clouds, and even textual data, often incorporating techniques like contrastive learning and geometrically informed masking strategies to improve efficiency and performance. This approach is proving highly impactful, enabling advancements in various fields including 3D scene generation, gaze estimation, anomaly detection, and autonomous driving by providing effective pre-trained models for downstream tasks.
Papers
Visualizing the loss landscape of Self-supervised Vision Transformer
Youngwan Lee, Jeffrey Ryan Willette, Jonghee Kim, Sung Ju Hwang
Learning Shared RGB-D Fields: Unified Self-supervised Pre-training for Label-efficient LiDAR-Camera 3D Perception
Xiaohao Xu, Ye Li, Tianyi Zhang, Jinrong Yang, Matthew Johnson-Roberson, Xiaonan Huang