Extraction Attack
Extraction attacks exploit the tendency of large language models (LLMs) and other deep learning models to memorize training data, enabling adversaries to illicitly retrieve sensitive information. Current research focuses on developing and evaluating these attacks against various model architectures, including LLMs like GPT and specialized models for tasks such as image generation, analyzing the effectiveness of different attack strategies and exploring mitigation techniques like model editing and data deduplication. Understanding and mitigating extraction attacks is crucial for ensuring the privacy and security of AI systems and their applications, particularly in sensitive domains like healthcare and finance.
Papers
July 21, 2022
July 14, 2022