Unlabeled Speech

Unlabeled speech research focuses on leveraging vast amounts of untranscribed audio data to advance speech technologies, particularly in low-resource settings where labeled data is scarce. Current efforts concentrate on self-supervised learning methods, employing architectures like HuBERT and transformers, to learn robust speech representations from unlabeled audio, often incorporating techniques like contrastive and non-contrastive losses, pseudo-labeling, and data augmentation. These advancements are significantly impacting automatic speech recognition, speech synthesis, keyword spotting, and emotion recognition, enabling the development of more accurate and inclusive speech processing systems for a wider range of languages and applications.

Papers