Low Resource Language
Low-resource language (LRL) research focuses on developing natural language processing (NLP) techniques for languages lacking substantial digital resources, aiming to bridge the technological gap between high- and low-resource languages. Current research emphasizes leveraging multilingual pre-trained models like Whisper and adapting them to LRLs through techniques such as weighted cross-entropy, data augmentation (including synthetic data generation), and model optimization methods like pruning and knowledge distillation. This work is crucial for promoting linguistic diversity, enabling access to technology for under-resourced communities, and advancing the broader field of NLP by addressing the challenges posed by data scarcity and linguistic variation.
Papers
Overview of the First Workshop on Language Models for Low-Resource Languages (LoResLM 2025)
Hansi Hettiarachchi, Tharindu Ranasinghe, Paul Rayson, Ruslan Mitkov, Mohamed Gaber, Damith Premasiri, Fiona Anting Tan, Lasitha Uyangodage
Fine-tuning Whisper on Low-Resource Languages for Real-World Applications
Vincenzo Timmel, Claudio Paonessa, Reza Kakooee, Manfred Vogel, Daniel Perruchoud