Toxicity Detection

Toxicity detection research aims to automatically identify harmful online content, encompassing hate speech, harassment, and misinformation, across various modalities (text, speech, video). Current efforts focus on improving model accuracy and efficiency, particularly using transformer-based architectures and techniques like few-shot learning and multi-task learning to address challenges such as data scarcity, bias, and adversarial attacks. This field is crucial for mitigating online harm and promoting safer digital environments, with applications ranging from content moderation on social media platforms to enhancing the safety of large language models. Furthermore, research increasingly emphasizes fairness and the need for explainable models to reduce bias and ensure equitable protection for all user groups.

Papers