Toxic Speech Detection

Toxic speech detection aims to automatically identify harmful or offensive language online, focusing on improving accuracy and mitigating biases inherent in existing models. Current research emphasizes developing more transparent and adaptable models, often incorporating techniques like knowledge distillation and attention mechanisms, while also addressing the challenges of detecting implicit toxicity and adversarial attacks (e.g., intentionally misspelled words). This field is crucial for creating safer online environments and fostering equitable content moderation practices, driving ongoing efforts to create more robust, unbiased, and explainable detection systems.

Papers