Extraction Attack

Extraction attacks exploit the tendency of large language models (LLMs) and other deep learning models to memorize training data, enabling adversaries to illicitly retrieve sensitive information. Current research focuses on developing and evaluating these attacks against various model architectures, including LLMs like GPT and specialized models for tasks such as image generation, analyzing the effectiveness of different attack strategies and exploring mitigation techniques like model editing and data deduplication. Understanding and mitigating extraction attacks is crucial for ensuring the privacy and security of AI systems and their applications, particularly in sensitive domains like healthcare and finance.

Extraction Attack

Papers

Careful What You Wish For: on the Extraction of Adversarially Trained Models

Combing for Credentials: Active Pattern Extraction from Smart Reply

Memorization in NLP Fine-tuning Methods