Harmful Data
Harmful data, encompassing toxic text, inappropriate images, and misinformation, poses a significant threat to the development and deployment of machine learning models, particularly large language models (LLMs). Current research focuses on detecting and mitigating the impact of such data through techniques like dataset watermarking, improved annotation frameworks, and adversarial training methods designed to enhance model robustness against harmful fine-tuning. These efforts are crucial for ensuring the safety and ethical use of AI systems, impacting both the reliability of model outputs and the broader societal implications of AI technology.
Papers
October 16, 2024
October 13, 2024
September 29, 2024
September 27, 2024
September 26, 2024
August 6, 2024
June 7, 2024
May 21, 2024
May 15, 2024
March 12, 2024
November 5, 2023
August 18, 2023
June 21, 2023
May 23, 2023
December 20, 2022