Toxic Language

Toxic language, encompassing hate speech, insults, and other harmful expressions, is a significant concern in online communication, with research focusing on its detection and mitigation. Current efforts utilize large language models (LLMs) like BERT and GPT, along with techniques such as counterfactual generation and attention-weight adjustments, to identify and reduce toxicity in various languages and contexts, including social media, gaming platforms, and machine translation. This research is crucial for creating safer online environments and improving the ethical development and deployment of AI systems, particularly LLMs, which can inadvertently perpetuate or amplify harmful biases.

Papers