Self Supervised Representation Learning
Self-supervised representation learning aims to learn meaningful data representations from unlabeled data by designing pretext tasks that leverage inherent data structures or invariances. Current research focuses on developing novel pretext tasks and architectures, including contrastive learning, masked modeling, generative models (like diffusion models and VAEs), and variations incorporating semantic information or temporal consistency, often applied within transformer-based frameworks. These advancements are significantly impacting various fields, improving performance in downstream tasks like image classification, speech enhancement, and time series analysis, particularly where labeled data is scarce or expensive to obtain. The resulting robust and generalizable representations are proving valuable across diverse applications in computer vision, natural language processing, and medical image analysis.
Papers
Rethinking Self-Supervised Visual Representation Learning in Pre-training for 3D Human Pose and Shape Estimation
Hongsuk Choi, Hyeongjin Nam, Taeryung Lee, Gyeongsik Moon, Kyoung Mu Lee
Masked Image Modeling with Local Multi-Scale Reconstruction
Haoqing Wang, Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhi-Hong Deng, Kai Han