Harmful Unlearning
Harmful unlearning, also known as machine unlearning, aims to remove specific data or knowledge from trained machine learning models, particularly large language models (LLMs), without complete retraining. Current research focuses on developing effective unlearning algorithms, often employing techniques like gradient-based methods, knowledge distillation, and adversarial training, across various model architectures including LLMs and diffusion models. This field is crucial for addressing privacy concerns, mitigating biases, and enhancing the safety and robustness of AI systems, impacting both data protection regulations and the trustworthiness of AI applications.
Papers
CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP
Tianyu Yang, Lisen Dai, Zheyuan Liu, Xiangqi Wang, Meng Jiang, Yapeng Tian, Xiangliang Zhang
Attribute-to-Delete: Machine Unlearning via Datamodel Matching
Kristian Georgiev, Roy Rinberg, Sung Min Park, Shivam Garg, Andrew Ilyas, Aleksander Madry, Seth Neel