Unlabeled Speech
Unlabeled speech research focuses on leveraging vast amounts of untranscribed audio data to advance speech technologies, particularly in low-resource settings where labeled data is scarce. Current efforts concentrate on self-supervised learning methods, employing architectures like HuBERT and transformers, to learn robust speech representations from unlabeled audio, often incorporating techniques like contrastive and non-contrastive losses, pseudo-labeling, and data augmentation. These advancements are significantly impacting automatic speech recognition, speech synthesis, keyword spotting, and emotion recognition, enabling the development of more accurate and inclusive speech processing systems for a wider range of languages and applications.
Papers
DDKtor: Automatic Diadochokinetic Speech Analysis
Yael Segal, Kasia Hitczenko, Matthew Goldrick, Adam Buchwald, Angela Roberts, Joseph Keshet
STOP: A dataset for Spoken Task Oriented Semantic Parsing
Paden Tomasello, Akshat Shrivastava, Daniel Lazar, Po-Chun Hsu, Duc Le, Adithya Sagar, Ali Elkahky, Jade Copet, Wei-Ning Hsu, Yossi Adi, Robin Algayres, Tu Ahn Nguyen, Emmanuel Dupoux, Luke Zettlemoyer, Abdelrahman Mohamed
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
Junrui Ni, Liming Wang, Heting Gao, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson
Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus
Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Sunghwan Ahn, Joun Yeop Lee, Nam Soo Kim