Self Supervised
Self-supervised learning (SSL) aims to train machine learning models using unlabeled data by designing pretext tasks that encourage the model to learn useful representations. Current research focuses on improving generalization, mitigating overfitting, and developing efficient architectures like transformers and CNNs for various modalities (images, audio, point clouds, fMRI data). SSL's significance lies in its ability to leverage vast amounts of readily available unlabeled data, leading to improved performance on downstream tasks and reducing the reliance on expensive and time-consuming manual labeling, particularly impacting fields like medical imaging, speech processing, and autonomous driving.
Papers
SSL-CPCD: Self-supervised learning with composite pretext-class discrimination for improved generalisability in endoscopic image analysis
Ziang Xu, Jens Rittscher, Sharib Ali
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Yizhi Li, Ruibin Yuan, Ge Zhang, Yinghao Ma, Xingran Chen, Hanzhi Yin, Chenghao Xiao, Chenghua Lin, Anton Ragni, Emmanouil Benetos, Norbert Gyenge, Roger Dannenberg, Ruibo Liu, Wenhu Chen, Gus Xia, Yemin Shi, Wenhao Huang, Zili Wang, Yike Guo, Jie Fu
Augmentation-aware Self-supervised Learning with Conditioned Projector
Marcin Przewięźlikowski, Mateusz Pyla, Bartosz Zieliński, Bartłomiej Twardowski, Jacek Tabor, Marek Śmieja
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning
Xuankai Chang, Brian Yan, Yuya Fujita, Takashi Maekaku, Shinji Watanabe
MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) Representations
Calum Heggan, Tim Hospedales, Sam Budgett, Mehrdad Yaghoobi
Reverse Engineering Self-Supervised Learning
Ido Ben-Shaul, Ravid Shwartz-Ziv, Tomer Galanti, Shai Dekel, Yann LeCun
Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo
Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation
Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino
Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?
Eklavya Sarkar, Mathew Magimai. -Doss
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogerio Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James Glass
Pruning Pre-trained Language Models with Principled Importance and Self-regularization
Siyu Ren, Kenny Q. Zhu