Speech Datasets
Speech datasets are crucial for training and evaluating automatic speech recognition (ASR) and text-to-speech (TTS) systems, as well as other speech processing applications like speech emotion recognition. Current research focuses on creating larger, more diverse datasets encompassing various languages, accents, speaking styles (including those with speech impediments), and recording conditions, alongside developing methods to improve data efficiency (e.g., data pruning, self-training) and address biases. These advancements are vital for improving the accuracy and robustness of speech technologies, leading to broader accessibility and applicability across diverse populations and contexts.
Papers
The Unreliability of Acoustic Systems in Alzheimer's Speech Datasets with Heterogeneous Recording Conditions
Lara Gauder, Pablo Riera, Andrea Slachevsky, Gonzalo Forno, Adolfo M. Garcia, Luciana Ferrer
ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages
Mahta Fetrat Qharabagh, Zahra Dehghanian, Hamid R. Rabiee