Low Resource Language
Low-resource language (LRL) research focuses on developing natural language processing (NLP) techniques for languages lacking substantial digital resources, aiming to bridge the technological gap between high- and low-resource languages. Current research emphasizes leveraging multilingual pre-trained models like Whisper and adapting them to LRLs through techniques such as weighted cross-entropy, data augmentation (including synthetic data generation), and model optimization methods like pruning and knowledge distillation. This work is crucial for promoting linguistic diversity, enabling access to technology for under-resourced communities, and advancing the broader field of NLP by addressing the challenges posed by data scarcity and linguistic variation.
Papers
Detection of Offensive and Threatening Online Content in a Low Resource Language
Fatima Muhammad Adam, Abubakar Yakubu Zandam, Isa Inuwa-Dutse
Sinhala-English Word Embedding Alignment: Introducing Datasets and Benchmark for a Low Resource Language
Kasun Wickramasinghe, Nisansa de Silva
TaCo: Enhancing Cross-Lingual Transfer for Low-Resource Languages in LLMs through Translation-Assisted Chain-of-Thought Processes
Bibek Upadhayay, Vahid Behzadan
To Translate or Not to Translate: A Systematic Investigation of Translation-Based Cross-Lingual Transfer to Low-Resource Languages
Benedikt Ebing, Goran Glavaš
When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages
Tyler A. Chang, Catherine Arnett, Zhuowen Tu, Benjamin K. Bergen