Twitter Corpus

Twitter corpora are collections of tweets used to train and evaluate natural language processing (NLP) models, primarily focusing on understanding and analyzing the nuances of informal online communication. Current research emphasizes addressing biases stemming from the overrepresentation of standard English and the development of robust models for tasks like personality profiling, cyberbullying detection, and identifying diverse English varieties, often employing transformer-based architectures like BERT. These efforts are crucial for improving the fairness and accuracy of NLP systems, leading to more effective tools for social media analysis and a deeper understanding of online behavior.

Papers