Low Resource Speech Recognition
Low-resource speech recognition (ASR) focuses on developing accurate speech-to-text systems for languages with limited labeled training data. Current research emphasizes data augmentation techniques, including cross-lingual transfer learning, self-supervised learning (often using Transformer-based architectures like wav2vec 2.0 and HuBERT), and pseudo-labeling of unlabeled data, to improve model performance. These advancements leverage multilingual models, phonetic representations, and techniques like knowledge distillation and curriculum learning to maximize the utility of scarce resources. Successful solutions hold significant potential for broadening access to voice technologies and fostering linguistic diversity in the field of artificial intelligence.
Papers
Language-universal phonetic encoder for low-resource speech recognition
Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang
Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition
Siyuan Feng, Ming Tu, Rui Xia, Chuanzeng Huang, Yuxuan Wang