Common Voice

Common Voice is a large, crowdsourced speech dataset used to train and evaluate automatic speech recognition (ASR) and text-to-speech (TTS) systems. Current research focuses on mitigating biases within the dataset, adapting it for low-resource languages through techniques like data augmentation and voice conversion, and improving model performance using architectures such as wav2vec 2.0 and XLSR-53. This work is significant because it addresses challenges in building fair and effective speech technologies, particularly for under-represented languages and demographics, impacting fields ranging from accessibility to cross-lingual communication.

Papers