Indian Language
Research on Indian languages focuses on developing and evaluating natural language processing (NLP) models for the diverse linguistic landscape of India, addressing the challenges posed by low-resource languages and significant dialectal variation. Current efforts concentrate on adapting and fine-tuning multilingual transformer models, such as BERT and its variants, for tasks like machine translation, question answering, and sentiment analysis, alongside developing new benchmarks and datasets to facilitate robust evaluation. This work is crucial for bridging the digital divide, enabling wider access to technology and information in India, and advancing the broader field of multilingual NLP.
Papers
IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation metrics for Indian Languages
Ananya B. Sai, Vignesh Nagarajan, Tanay Dixit, Raj Dabre, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra
Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages
Arnav Mhaske, Harshit Kedia, Sumanth Doddapaneni, Mitesh M. Khapra, Pratyush Kumar, Rudra Murthy, Anoop Kunchukuttan
L3Cube-HindBERT and DevBERT: Pre-Trained BERT Transformer models for Devanagari based Hindi and Marathi Languages
Raviraj Joshi
L3Cube-MahaSBERT and HindSBERT: Sentence BERT Models and Benchmarking BERT Sentence Representations for Hindi and Marathi
Ananya Joshi, Aditi Kajale, Janhavi Gadre, Samruddhi Deode, Raviraj Joshi