Speech Corpus

Speech corpora are collections of recorded speech data, crucial for training and evaluating automatic speech recognition (ASR) and text-to-speech (TTS) systems. Current research emphasizes creating diverse corpora representing various accents, languages (including low-resource and indigenous languages), speaking styles, and conditions (e.g., disordered speech), often employing self-supervised learning and transformer-based models like Wav2Vec 2.0 and Whisper for improved accuracy and efficiency. These advancements are vital for improving the accessibility and performance of speech technologies across diverse populations and applications, including healthcare, education, and assistive technologies.

Papers