Marathi Corpus

Marathi corpus research focuses on developing and expanding linguistic resources for the Marathi language, a low-resource language with limited existing NLP tools. Current efforts center on creating large, diverse datasets for various tasks (e.g., text classification, question answering, sentiment analysis) and training effective Marathi language models, primarily leveraging BERT-based architectures and techniques like knowledge distillation and pruning to improve efficiency. This work is crucial for advancing Marathi NLP capabilities, enabling the development of practical applications and contributing significantly to the broader field of low-resource language processing.

Papers

December 20, 2024

A Review of the Marathi Natural Language Processing
Asang Dani, Shailesh R Sathe
Narrative Review NLP Task NLP Research Marathi Corpus

October 11, 2024

Long Range Named Entity Recognition for Marathi Documents
Pranita Deshmukh, Nikita Kulkarni, Sanhita Kulkarni, Kareena Manghani, Geetanjali Kale, Raviraj Joshi
Entity Recognition Named Entity Recognition Long Range Marathi Corpus

September 21, 2024

On Importance of Pruning and Distillation for Efficient Low Resource NLP
Aishwarya Mirashi, Purva Lingayat, Srushti Sonavane, Tejas Padhiyar, Raviraj Joshi, Geetanjali Kale
Natural Language Processing Low Resource Language Low Resource Edge Pruning Importance Aware Mutual Distillation Large Transformer Model Marathi Corpus

April 28, 2024

L3Cube-MahaNews: News-based Short Text and Long Document Classification Datasets in Marathi
Saloni Mittal, Vidula Magdum, Omkar Dhekane, Sharayu Hiwarkhedkar, Raviraj Joshi
News Article Multilingual BERT L3Cube MahaSocialNER Marathi Corpus

November 5, 2023

mahaNLP: A Marathi Natural Language Processing Library
Vidula Magdum, Omkar Dhekane, Sharayu Hiwarkhedkar, Saloni Mittal, Raviraj Joshi
Natural Language Processing Framework Marathi Corpus

September 27, 2023

Question answering using deep learning in low resource Indian language Marathi
Dhiraj Amin, Sharvari Govilkar, Sagar Kulkarni
Deep Learning Yes No Question Low Resource Bidirectional Encoder Representation Marathi Corpus Comprehension Datasets

June 24, 2023

My Boli: Code-mixed Marathi-English Corpora, Pretrained Language Models and Evaluation Benchmarks
Tanmay Chavan, Omkar Gokhale, Aditya Kane, Shantanu Patankar, Raviraj Joshi
Pretrained Language Model Hate Speech Detection Evaluation Benchmark Code Mixed Marathi Corpus

November 21, 2022

L3Cube-HindBERT and DevBERT: Pre-Trained BERT Transformer models for Devanagari based Hindi and Marathi Languages
Raviraj Joshi
Multilingual Model Indian Language Multilingual BERT L3Cube MahaSocialNER Monolingual BERT Model Marathi Corpus

May 29, 2022

L3Cube-MahaNLP: Marathi Natural Language Processing Datasets, Models, and Library
Raviraj Joshi
Natural Language Processing Full Model Easy to Use Library Natural Language Processing Tool L3Cube MahaSocialNER Marathi Corpus

February 2, 2022

L3Cube-MahaCorpus and MahaBERT: Marathi Monolingual Corpus, Marathi BERT Language Models, and Resources
Raviraj Joshi
New Resource L3Cube MahaSocialNER Marathi Corpus

Marathi Corpus

Papers

A Review of the Marathi Natural Language Processing

Long Range Named Entity Recognition for Marathi Documents

On Importance of Pruning and Distillation for Efficient Low Resource NLP

L3Cube-MahaNews: News-based Short Text and Long Document Classification Datasets in Marathi

mahaNLP: A Marathi Natural Language Processing Library

Question answering using deep learning in low resource Indian language Marathi

My Boli: Code-mixed Marathi-English Corpora, Pretrained Language Models and Evaluation Benchmarks

L3Cube-HindBERT and DevBERT: Pre-Trained BERT Transformer models for Devanagari based Hindi and Marathi Languages

L3Cube-MahaNLP: Marathi Natural Language Processing Datasets, Models, and Library

L3Cube-MahaCorpus and MahaBERT: Marathi Monolingual Corpus, Marathi BERT Language Models, and Resources