Harmful Unlearning
Harmful unlearning, also known as machine unlearning, aims to remove specific data or knowledge from trained machine learning models, particularly large language models (LLMs), without complete retraining. Current research focuses on developing effective unlearning algorithms, often employing techniques like gradient-based methods, knowledge distillation, and adversarial training, across various model architectures including LLMs and diffusion models. This field is crucial for addressing privacy concerns, mitigating biases, and enhancing the safety and robustness of AI systems, impacting both data protection regulations and the trustworthiness of AI applications.
Papers
January 30, 2024
January 29, 2024
January 26, 2024
January 19, 2024
January 17, 2024
January 11, 2024
December 26, 2023
December 22, 2023
December 12, 2023
December 7, 2023
December 4, 2023
November 26, 2023
October 31, 2023
October 14, 2023
October 9, 2023
September 19, 2023
July 19, 2023
June 27, 2023
May 21, 2023
May 10, 2023