Toxic Language Detection

Toxic language detection aims to automatically identify harmful or offensive language in text and other media, striving for accurate and unbiased identification. Current research focuses on improving model efficiency (e.g., using compact transformer architectures), mitigating biases through techniques like counterfactual causal debiasing and conditional multi-task learning, and enhancing robustness against adversarial attacks. This field is crucial for creating safer online environments and fostering more equitable online interactions, impacting both the development of fairer AI systems and the design of effective content moderation strategies.

Papers