Marathi Corpus

Marathi corpus research focuses on developing and expanding linguistic resources for the Marathi language, a low-resource language with limited existing NLP tools. Current efforts center on creating large, diverse datasets for various tasks (e.g., text classification, question answering, sentiment analysis) and training effective Marathi language models, primarily leveraging BERT-based architectures and techniques like knowledge distillation and pruning to improve efficiency. This work is crucial for advancing Marathi NLP capabilities, enabling the development of practical applications and contributing significantly to the broader field of low-resource language processing.

Papers