Memorized Sample
Memorized sample research focuses on identifying and mitigating the issue of large language models (LLMs) and diffusion models memorizing and reproducing verbatim portions of their training data, raising privacy and copyright concerns. Current research explores methods to detect and remove memorized samples, employing techniques like entropy maximization, neuron localization, and progressive staged training across various model architectures. This work is crucial for responsible AI development, ensuring the ethical and legal deployment of powerful models by preventing the unintended leakage of sensitive information.
Papers
August 9, 2024
June 4, 2024
March 1, 2024
June 3, 2023
March 30, 2023
June 14, 2022