Low Resource Language
Low-resource language (LRL) research focuses on developing natural language processing (NLP) techniques for languages lacking substantial digital resources, aiming to bridge the technological gap between high- and low-resource languages. Current research emphasizes leveraging multilingual pre-trained models like Whisper and adapting them to LRLs through techniques such as weighted cross-entropy, data augmentation (including synthetic data generation), and model optimization methods like pruning and knowledge distillation. This work is crucial for promoting linguistic diversity, enabling access to technology for under-resourced communities, and advancing the broader field of NLP by addressing the challenges posed by data scarcity and linguistic variation.
Papers
Strategies for improving low resource speech to text translation relying on pre-trained ASR models
Santosh Kesiraju, Marek Sarvas, Tomas Pavlicek, Cecile Macaire, Alejandro Ciuba
MetaXLR -- Mixed Language Meta Representation Transformation for Low-resource Cross-lingual Learning based on Multi-Armed Bandit
Liat Bezalel, Eyal Orgad
Findings of the VarDial Evaluation Campaign 2023
Noëmi Aepli, Çağrı Çöltekin, Rob Van Der Goot, Tommi Jauhiainen, Mourhaf Kazzaz, Nikola Ljubešić, Kai North, Barbara Plank, Yves Scherrer, Marcos Zampieri
Ethical Considerations for Machine Translation of Indigenous Languages: Giving a Voice to the Speakers
Manuel Mager, Elisabeth Mager, Katharina Kann, Ngoc Thang Vu
LIMIT: Language Identification, Misidentification, and Translation using Hierarchical Models in 350+ Languages
Milind Agarwal, Md Mahfuz Ibn Alam, Antonios Anastasopoulos
An Open Dataset and Model for Language Identification
Laurie Burchell, Alexandra Birch, Nikolay Bogoychev, Kenneth Heafield
AxomiyaBERTa: A Phonologically-aware Transformer Model for Assamese
Abhijnan Nath, Sheikh Mannan, Nikhil Krishnaswamy
Automatic Readability Assessment for Closely Related Languages
Joseph Marvin Imperial, Ekaterina Kochmar
Automated stance detection in complex topics and small languages: the challenging case of immigration in polarizing news media
Mark Mets, Andres Karjus, Indrek Ibrus, Maximilian Schich
Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs
Yihong Liu, Haotian Ye, Leonie Weissweiler, Renhao Pei, Hinrich Schütze