Speech Corpus
Speech corpora are collections of recorded speech data, crucial for training and evaluating automatic speech recognition (ASR) and text-to-speech (TTS) systems. Current research emphasizes creating diverse corpora representing various accents, languages (including low-resource and indigenous languages), speaking styles, and conditions (e.g., disordered speech), often employing self-supervised learning and transformer-based models like Wav2Vec 2.0 and Whisper for improved accuracy and efficiency. These advancements are vital for improving the accessibility and performance of speech technologies across diverse populations and applications, including healthcare, education, and assistive technologies.
Papers
Towards Unsupervised Speech Recognition Without Pronunciation Models
Junrui Ni, Liming Wang, Yang Zhang, Kaizhi Qian, Heting Gao, Mark Hasegawa-Johnson, Chang D. Yoo
Semi-Supervised Spoken Language Glossification
Huijie Yao, Wengang Zhou, Hao Zhou, Houqiang Li
Attentive Merging of Hidden Embeddings from Pre-trained Speech Model for Anti-spoofing Detection
Zihan Pan, Tianchi Liu, Hardik B. Sailor, Qiongqiong Wang