Novel Corpus

Novel corpora are increasingly crucial for advancing natural language processing (NLP) research, providing annotated datasets for training and evaluating models on diverse tasks and languages. Current research focuses on developing corpora for specific challenges, including low-resource languages, stylistic analysis, and the detection of nuanced linguistic phenomena like misogyny or social determinants of health, often employing large language models (LLMs) and BERT-based architectures for analysis. These resources are vital for improving the accuracy and applicability of NLP systems across various domains, from healthcare and legal analysis to literary studies and cross-cultural communication.

Papers