Gated Toxicity Avoidance
Gated Toxicity Avoidance (GTA) focuses on mitigating harmful language generation in large language models (LLMs) while preserving their desirable performance characteristics like fluency and coherence. Current research emphasizes developing methods, including reinforcement learning and retrieval-augmented approaches, that effectively reduce toxicity across multiple languages and diverse prompts without significantly compromising generation quality. This work is crucial for ensuring the safe and responsible deployment of LLMs, addressing ethical concerns and promoting the development of more beneficial AI systems.
Papers
November 10, 2024
June 25, 2024
May 15, 2024
December 11, 2023
October 11, 2023
October 6, 2022
March 6, 2022