Russian Corpus
Russian corpora are collections of Russian text data used to train and evaluate natural language processing (NLP) models. Current research focuses on developing and improving these corpora for various tasks, including discourse parsing, grammatical error correction, sentiment analysis, and linguistic acceptability judgment, often leveraging transformer-based language models like BERT. These efforts are crucial for advancing NLP capabilities in Russian, a language with significant linguistic complexity and a relatively smaller amount of readily available annotated data compared to English, ultimately impacting applications like machine translation, text summarization, and chatbot development.
Papers
September 23, 2024
June 7, 2024
July 4, 2023
May 28, 2023
April 4, 2023
October 23, 2022
September 28, 2022