Monolingual Data

Monolingual data, consisting of text or speech in a single language, plays a crucial role in advancing natural language processing (NLP), particularly for low-resource languages lacking extensive parallel corpora. Current research focuses on leveraging monolingual data through techniques like back-translation, denoising autoencoders, and self-supervised learning to improve multilingual machine translation, speech-to-speech translation, and other NLP tasks. These efforts are significant because they address the data scarcity problem hindering progress in many languages, enabling the development of more inclusive and widely applicable NLP technologies.

Papers