Low Resource Indian Language

Research on low-resource Indian languages focuses on developing robust natural language processing (NLP) tools despite limited available data. Current efforts concentrate on adapting and fine-tuning pre-trained multilingual models, such as BERT and Whisper, for tasks like machine translation, speech recognition, and named entity recognition, often employing transfer learning techniques from higher-resource languages. These advancements are crucial for bridging the digital divide and enabling access to information and technology in numerous under-resourced communities, while also providing valuable datasets and benchmarks for the broader NLP research community. The development of new data augmentation strategies and improved evaluation metrics for low-resource settings are also active areas of investigation.

Papers