Fine Grained Detoxification
Fine-grained detoxification of large language models (LLMs) focuses on mitigating the generation of harmful or biased content, aiming for safer and more responsible deployment. Current research explores both training-based methods, such as adapting model parameters to align with human preferences, and decoding-based approaches that modify the generation process in real-time using techniques like subspace projection or controlled sampling. These efforts are crucial for addressing the ethical concerns surrounding LLMs and improving their suitability for high-stakes applications, such as fraud detection and online moderation, where nuanced understanding of toxicity is paramount.
Papers
December 18, 2024
October 27, 2024
October 7, 2024
October 4, 2024
September 9, 2024
June 27, 2024
May 22, 2024
April 16, 2024
February 25, 2024
February 23, 2024
October 14, 2023
October 10, 2022