Verbatim Memorization
Verbatim memorization, the ability of large language models (LLMs) and diffusion models to reproduce training data verbatim, is a significant concern due to privacy and copyright implications. Current research focuses on detecting and mitigating this memorization, exploring methods like unlearning, fine-tuning, and regularization across various architectures including GPT-Neo and Stable Diffusion. The effectiveness of these mitigation techniques remains a key area of investigation, with studies highlighting the challenge of eliminating memorization without substantially impacting model performance. This research is crucial for responsible development and deployment of these powerful models, ensuring ethical and legal compliance.
Papers
October 29, 2024
October 3, 2024
August 29, 2024
July 31, 2024
July 25, 2024
July 24, 2024
July 22, 2024
May 29, 2024
April 1, 2024
March 17, 2024
October 20, 2023
May 2, 2023
December 16, 2022
December 8, 2022
October 31, 2022