Text Detoxification

Text detoxification aims to remove offensive or harmful language from text while preserving its original meaning, a crucial task for creating safer online environments. Current research focuses on developing and improving various models, including large language models (LLMs), sequence-to-sequence models, and diffusion models, often leveraging techniques like counterfactual generation, in-context learning, and parallel data augmentation to enhance detoxification accuracy and fluency across multiple languages. These advancements are significant for mitigating online toxicity and improving the safety and trustworthiness of natural language processing applications. The field is also actively exploring new evaluation metrics that better align with human judgment of detoxification quality.

Papers