Memorization Effect

Memorization, the tendency of large language models (LLMs) and other deep learning models to reproduce training data verbatim, is a significant research area focusing on quantifying its extent, understanding its mechanisms (including the roles of attention and cross-attention), and developing mitigation strategies. Current research investigates memorization across various architectures, including transformers and recurrent neural networks, employing techniques like nucleus sampling and soft prompting to either measure or reduce this effect. Addressing memorization is crucial for ensuring data privacy, mitigating copyright infringement, and improving the reliability and generalizability of AI models across diverse applications.

Papers