Low Resource Speech Recognition

Low-resource speech recognition (ASR) focuses on developing accurate speech-to-text systems for languages with limited labeled training data. Current research emphasizes data augmentation techniques, including cross-lingual transfer learning, self-supervised learning (often using Transformer-based architectures like wav2vec 2.0 and HuBERT), and pseudo-labeling of unlabeled data, to improve model performance. These advancements leverage multilingual models, phonetic representations, and techniques like knowledge distillation and curriculum learning to maximize the utility of scarce resources. Successful solutions hold significant potential for broadening access to voice technologies and fostering linguistic diversity in the field of artificial intelligence.

Papers