Speech Corpus
Speech corpora are collections of recorded speech data, crucial for training and evaluating automatic speech recognition (ASR) and text-to-speech (TTS) systems. Current research emphasizes creating diverse corpora representing various accents, languages (including low-resource and indigenous languages), speaking styles, and conditions (e.g., disordered speech), often employing self-supervised learning and transformer-based models like Wav2Vec 2.0 and Whisper for improved accuracy and efficiency. These advancements are vital for improving the accessibility and performance of speech technologies across diverse populations and applications, including healthcare, education, and assistive technologies.
Papers
Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts
Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola Garcia, Daniel Povey, Sanjeev Khudanpur
Speaker verification using attentive multi-scale convolutional recurrent network
Yanxiong Li, Zhongjie Jiang, Wenchang Cao, Qisheng Huang
Investigating model performance in language identification: beyond simple error statistics
Suzy J. Styles, Victoria Y. H. Chua, Fei Ting Woon, Hexin Liu, Leibny Paola Garcia Perera, Sanjeev Khudanpur, Andy W. H. Khong, Justin Dauwels
STT4SG-350: A Speech Corpus for All Swiss German Dialect Regions
Michel Plüss, Jan Deriu, Yanick Schraner, Claudio Paonessa, Julia Hartmann, Larissa Schmidt, Christian Scheller, Manuela Hürlimann, Tanja Samardžić, Manfred Vogel, Mark Cieliebak