Multilingual Speech Corpus

Multilingual speech corpora are collections of recorded speech in multiple languages, crucial for developing speech technologies that transcend linguistic boundaries. Current research focuses on improving data quality, creating new corpora for under-resourced languages (including those at risk of extinction), and leveraging techniques like transfer learning and contrastive learning with transformer-based models (e.g., Wav2Vec 2.0) to build robust and generalizable speech recognition and generation systems. These advancements are vital for bridging the digital divide, enabling cross-lingual communication, and fostering research in diverse areas such as phonetics, linguistics, and speech pathology.

Papers