Multilingual Speech Corpus
Multilingual speech corpora are collections of recorded speech in multiple languages, crucial for developing speech technologies that transcend linguistic boundaries. Current research focuses on improving data quality, creating new corpora for under-resourced languages (including those at risk of extinction), and leveraging techniques like transfer learning and contrastive learning with transformer-based models (e.g., Wav2Vec 2.0) to build robust and generalizable speech recognition and generation systems. These advancements are vital for bridging the digital divide, enabling cross-lingual communication, and fostering research in diverse areas such as phonetics, linguistics, and speech pathology.
Papers
August 12, 2024
March 28, 2024
November 14, 2023
August 29, 2023
June 7, 2023
May 19, 2023
July 12, 2022
June 19, 2022