English Dataset

English datasets, crucial for training and evaluating natural language processing (NLP) models, are increasingly being augmented and complemented by multilingual resources to address biases and improve performance in non-English languages. Current research focuses on developing new multilingual benchmarks for various NLP tasks (e.g., question answering, named entity recognition, sentiment analysis), often leveraging large language models (LLMs) for data generation and cross-lingual transfer learning techniques to bridge the resource gap. This work is vital for advancing NLP capabilities beyond English-centric applications and fostering more equitable and inclusive language technologies globally.

Papers