Self Supervised
Self-supervised learning (SSL) aims to train machine learning models using unlabeled data by designing pretext tasks that encourage the model to learn useful representations. Current research focuses on improving generalization, mitigating overfitting, and developing efficient architectures like transformers and CNNs for various modalities (images, audio, point clouds, fMRI data). SSL's significance lies in its ability to leverage vast amounts of readily available unlabeled data, leading to improved performance on downstream tasks and reducing the reliance on expensive and time-consuming manual labeling, particularly impacting fields like medical imaging, speech processing, and autonomous driving.
Papers
Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang
Contrastive Self-Supervised Learning for Skeleton Representations
Nico Lingg, Miguel Sarabia, Luca Zappella, Barry-John Theobald
Speech separation with large-scale self-supervised learning
Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez
Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models
Travis M. Bartley, Fei Jia, Krishna C. Puvvada, Samuel Kriman, Boris Ginsburg
3DFill:Reference-guided Image Inpainting by Self-supervised 3D Image Alignment
Liang Zhao, Xinyuan Zhao, Hailong Ma, Xinyu Zhang, Long Zeng
T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5
Chan-Jan Hsu, Ho-Lam Chung, Hung-yi Lee, Yu Tsao
Avoid Overthinking in Self-Supervised Models for Speech Recognition
Dan Berrebbi, Brian Yan, Shinji Watanabe
Self-Supervised Intensity-Event Stereo Matching
Jinjin Gu, Jinan Zhou, Ringo Sai Wo Chu, Yan Chen, Jiawei Zhang, Xuanye Cheng, Song Zhang, Jimmy S. Ren
Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
Zili Huang, Desh Raj, Paola García, Sanjeev Khudanpur
Pixel-Wise Contrastive Distillation
Junqiang Huang, Zichao Guo
Elastic Weight Consolidation Improves the Robustness of Self-Supervised Learning Methods under Transfer
Andrius Ovsianas, Jason Ramapuram, Dan Busbridge, Eeshan Gunesh Dhekane, Russ Webb
Spectrograms Are Sequences of Patches
Leyi Zhao, Yi Li
A comprehensive study on self-supervised distillation for speaker representation learning
Zhengyang Chen, Yao Qian, Bing Han, Yanmin Qian, Michael Zeng