Toxicity Classification

Toxicity classification aims to automatically identify harmful or offensive language in text and speech, focusing on improving accuracy and fairness across diverse languages and demographics. Current research emphasizes developing robust models, often leveraging large language models and cross-modal learning techniques (combining text and speech data), while also addressing biases and limitations in existing datasets through innovative data creation methods and improved evaluation benchmarks. This field is crucial for mitigating online harms and fostering safer digital environments, impacting content moderation, social media platforms, and the development of responsible AI systems.

Papers