Self Supervised Speech

Self-supervised speech (SSS) leverages unlabeled audio data to learn robust speech representations, aiming to improve downstream tasks like speech recognition and translation without relying heavily on expensive labeled datasets. Current research focuses on understanding what information these models (e.g., WavLM, Wav2Vec 2.0) learn, comparing their representations to those of human brains and other models, and exploring efficient model architectures for resource-constrained environments. This approach holds significant promise for advancing speech processing in low-resource settings and improving applications ranging from speech-to-text translation to mental health screening through the development of novel speech-based biomarkers.

Papers