Memorized Sample

Memorized sample research focuses on identifying and mitigating the issue of large language models (LLMs) and diffusion models memorizing and reproducing verbatim portions of their training data, raising privacy and copyright concerns. Current research explores methods to detect and remove memorized samples, employing techniques like entropy maximization, neuron localization, and progressive staged training across various model architectures. This work is crucial for responsible AI development, ensuring the ethical and legal deployment of powerful models by preventing the unintended leakage of sensitive information.

Papers