Prompt Extraction

Prompt extraction research focuses on how attackers can recover the secret instructions (prompts) used to customize large language models (LLMs), thereby compromising intellectual property and potentially enabling malicious attacks. Current research investigates the vulnerability of various LLMs, including models from OpenAI and others, to different attack methods, and develops and evaluates defensive strategies against prompt leakage. This area is crucial because the widespread use of prompt-based LLM services necessitates robust security measures to protect proprietary information and maintain the integrity of these systems.

Papers

December 18, 2024

Safeguarding System Prompts for LLMs
Zhifeng Jiang, Zhihua Jin, Guoliang He
Large Language Model Prompt Programming Prompt Extraction

August 5, 2024

Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models
Zi Liang, Haibo Hu, Qingqing Ye, Yaxin Xiao, Haoyang Li
Large Language Model Complex Prompt Extraction Attack Prompt Extraction

June 10, 2024

Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications
Junlin Wang, Tianyi Yang, Roy Xie, Bhuwan Dhingra
Prompt Attack LLM Integrated Application LLM Robustness Extraction Attack Prompt Extraction Benchmark Prompt Extraction

July 13, 2023

Effective Prompt Extraction from Language Models
Yiming Zhang, Nicholas Carlini, Daphne Ippolito
Large Language Model Language Model Adversarial Agent Text Attack Prompt Extraction

Prompt Extraction

Papers

Safeguarding System Prompts for LLMs

Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models

Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications

Effective Prompt Extraction from Language Models