Toxicity Detection Model
Toxicity detection models aim to automatically identify harmful language in text, focusing on improving accuracy and interpretability across diverse contexts like social media and user-AI interactions. Current research emphasizes developing more robust models, often based on transformer architectures, that are less susceptible to adversarial attacks and better equipped to handle nuanced forms of toxicity, including implicit bias and subtle triggers. This work is crucial for creating safer online environments and mitigating the spread of harmful content, with implications for content moderation, chatbot development, and the broader study of algorithmic bias.
Papers
June 21, 2024
December 4, 2023
October 26, 2023
July 18, 2023
March 1, 2023