Indian Language
Research on Indian languages focuses on developing and evaluating natural language processing (NLP) models for the diverse linguistic landscape of India, addressing the challenges posed by low-resource languages and significant dialectal variation. Current efforts concentrate on adapting and fine-tuning multilingual transformer models, such as BERT and its variants, for tasks like machine translation, question answering, and sentiment analysis, alongside developing new benchmarks and datasets to facilitate robust evaluation. This work is crucial for bridging the digital divide, enabling wider access to technology and information in India, and advancing the broader field of multilingual NLP.
Papers
Impact of Visual Context on Noisy Multimodal NMT: An Empirical Study for English to Indian Languages
Baban Gain, Dibyanayan Bandyopadhyay, Samrat Mukherjee, Chandranath Adak, Asif Ekbal
Cyberbullying Detection for Low-resource Languages and Dialects: Review of the State of the Art
Tanjim Mahmud, Michal Ptaszynski, Juuso Eronen, Fumito Masui
IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages
Jay Gala, Pranjal A. Chitale, Raghavan AK, Varun Gumma, Sumanth Doddapaneni, Aswanth Kumar, Janki Nawale, Anupama Sujatha, Ratish Puduppully, Vivek Raghavan, Pratyush Kumar, Mitesh M. Khapra, Raj Dabre, Anoop Kunchukuttan
Bhasha-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languages
Yash Madhani, Mitesh M. Khapra, Anoop Kunchukuttan