Speech Datasets

Speech datasets are crucial for training and evaluating automatic speech recognition (ASR) and text-to-speech (TTS) systems, as well as other speech processing applications like speech emotion recognition. Current research focuses on creating larger, more diverse datasets encompassing various languages, accents, speaking styles (including those with speech impediments), and recording conditions, alongside developing methods to improve data efficiency (e.g., data pruning, self-training) and address biases. These advancements are vital for improving the accuracy and robustness of speech technologies, leading to broader accessibility and applicability across diverse populations and contexts.

Papers