Toxicity Annotation

Toxicity annotation focuses on automatically identifying and classifying toxic online content, aiming to improve content moderation and mitigate the harmful effects of online hate speech and misinformation. Current research emphasizes developing robust and reliable toxicity detection models, often employing transformer-based architectures and exploring techniques to address biases stemming from annotator demographics and model limitations, such as sensitivity to input order and susceptibility to adversarial attacks. This field is crucial for creating safer online environments and informing the development of more ethical and responsible AI systems, with implications for social media platforms, online communities, and broader societal well-being.

Papers