Monolingual BERT Model
Monolingual BERT models are versions of the BERT architecture trained exclusively on a single language's data, aiming to improve performance on downstream tasks within that language compared to multilingual counterparts. Current research focuses on optimizing these models for low-resource languages through techniques like parameter reduction, multi-task learning, and leveraging synthetic data for fine-tuning, particularly for sentence embedding tasks. This work is significant because it addresses limitations of multilingual models in capturing nuanced linguistic features and biases specific to individual languages, leading to improved performance in various natural language processing applications and a deeper understanding of how language is represented in these models.
Papers
L3Cube-HindBERT and DevBERT: Pre-Trained BERT Transformer models for Devanagari based Hindi and Marathi Languages
Raviraj Joshi
L3Cube-MahaSBERT and HindSBERT: Sentence BERT Models and Benchmarking BERT Sentence Representations for Hindi and Marathi
Ananya Joshi, Aditi Kajale, Janhavi Gadre, Samruddhi Deode, Raviraj Joshi