Low Resource Language

Low-resource language (LRL) research focuses on developing natural language processing (NLP) techniques for languages lacking substantial digital resources, aiming to bridge the technological gap between high- and low-resource languages. Current research emphasizes leveraging multilingual pre-trained models like Whisper and adapting them to LRLs through techniques such as weighted cross-entropy, data augmentation (including synthetic data generation), and model optimization methods like pruning and knowledge distillation. This work is crucial for promoting linguistic diversity, enabling access to technology for under-resourced communities, and advancing the broader field of NLP by addressing the challenges posed by data scarcity and linguistic variation.

Papers