Low Resource Language
Low-resource language (LRL) research focuses on developing natural language processing (NLP) techniques for languages lacking substantial digital resources, aiming to bridge the technological gap between high- and low-resource languages. Current research emphasizes leveraging multilingual pre-trained models like Whisper and adapting them to LRLs through techniques such as weighted cross-entropy, data augmentation (including synthetic data generation), and model optimization methods like pruning and knowledge distillation. This work is crucial for promoting linguistic diversity, enabling access to technology for under-resourced communities, and advancing the broader field of NLP by addressing the challenges posed by data scarcity and linguistic variation.
Papers
Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models
Harshita Diddee, Sandipan Dandapat, Monojit Choudhury, Tanuja Ganu, Kalika Bali
Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations
Haohan Guo, Fenglong Xie, Xixin Wu, Hui Lu, Helen Meng
Sinhala Sentence Embedding: A Two-Tiered Structure for Low-Resource Languages
Gihan Weeraprameshwara, Vihanga Jayawickrama, Nisansa de Silva, Yudhanjaya Wijeratne
Eeny, meeny, miny, moe. How to choose data for morphological inflection
Saliha Muradoglu, Mans Hulden
Modeling the Graphotactics of Low-Resource Languages Using Sequential GANs
Isaac Wasserman
Automatic Speech Recognition of Low-Resource Languages Based on Chukchi
Anastasia Safonova, Tatiana Yudina, Emil Nadimanov, Cydnie Davenport
Enriching Biomedical Knowledge for Low-resource Language Through Large-Scale Translation
Long Phan, Tai Dang, Hieu Tran, Trieu H. Trinh, Vy Phan, Lam D. Chau, Minh-Thang Luong
XF2T: Cross-lingual Fact-to-Text Generation for Low-Resource Languages
Shivprasad Sagare, Tushar Abhishek, Bhavyajeet Singh, Anubhav Sharma, Manish Gupta, Vasudeva Varma
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline
Yifan Hu, Pengkai Yin, Rui Liu, Feilong Bao, Guanglai Gao
Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages
Kaushal Santosh Bhogale, Abhigyan Raman, Tahir Javed, Sumanth Doddapaneni, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra
Cross-lingual Transfer Learning for Fake News Detector in a Low-Resource Language
Sangdo Han