Self Supervised Representation Learning
Self-supervised representation learning aims to learn meaningful data representations from unlabeled data by designing pretext tasks that leverage inherent data structures or invariances. Current research focuses on developing novel pretext tasks and architectures, including contrastive learning, masked modeling, generative models (like diffusion models and VAEs), and variations incorporating semantic information or temporal consistency, often applied within transformer-based frameworks. These advancements are significantly impacting various fields, improving performance in downstream tasks like image classification, speech enhancement, and time series analysis, particularly where labeled data is scarce or expensive to obtain. The resulting robust and generalizable representations are proving valuable across diverse applications in computer vision, natural language processing, and medical image analysis.
Papers
Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling
Puyuan Peng, David Harwath
Context Autoencoder for Self-Supervised Representation Learning
Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, Jingdong Wang