Harmful Unlearning
Harmful unlearning, also known as machine unlearning, aims to remove specific data or knowledge from trained machine learning models, particularly large language models (LLMs), without complete retraining. Current research focuses on developing effective unlearning algorithms, often employing techniques like gradient-based methods, knowledge distillation, and adversarial training, across various model architectures including LLMs and diffusion models. This field is crucial for addressing privacy concerns, mitigating biases, and enhancing the safety and robustness of AI systems, impacting both data protection regulations and the trustworthiness of AI applications.
Papers
February 26, 2023
February 24, 2023
February 20, 2023
February 4, 2023
December 19, 2022
October 28, 2022
October 17, 2022
September 14, 2022
July 12, 2022
May 31, 2022
May 25, 2022
April 15, 2022
March 22, 2022
January 14, 2022
November 26, 2021