Prompt Injection Attack

Prompt injection attacks exploit the vulnerability of large language models (LLMs) to malicious instructions embedded within user prompts, causing the models to deviate from their intended function. Current research focuses on developing and benchmarking these attacks across various LLM architectures, including those used in machine translation, robotic systems, and conversational search engines, and exploring both black-box and white-box defense mechanisms such as prompt engineering, fine-tuning, and input/output filtering. The widespread adoption of LLMs necessitates a thorough understanding of these attacks and the development of robust defenses to mitigate significant security risks in numerous applications.

Papers

July 23, 2024

Prompt Injection Attacks on Large Language Models in Oncology
Jan Clusmann, Dyke Ferber, Isabella C. Wiest, Carolin V. Schneider, Titus J. Brinker, Sebastian Foersch, Daniel Truhn, Jakob N. Kather
Vision Language Artificial Intelligence Model Medical Image Data Prompt Injection Attack Cancer Patient

July 12, 2024

Soft Prompts Go Hard: Steering Visual Language Models with Hidden Meta-Instructions
Tingwei Zhang, Collin Zhang, John X. Morris, Eugene Bagdasarian, Vitaly Shmatikov
Adversarial Example Human Instruction Soft Prompt Prompt Injection Attack Adversarial Objective

June 20, 2024

Prompt Injection Attacks in Defended Systems
Daniil Khomsky, Narek Maloyan, Bulat Nutfullin
Large Language Model Language Model State of the Art Defense Prompt Injection Attack

June 19, 2024

AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents
Edoardo Debenedetti, Jie Zhang, Mislav Balunović, Luca Beurer-Kellner, Marc Fischer, Florian Tramèr
Adversarial Robustness Dynamic Environment AI Agent New Attack LLM Agent Prompt Injection Attack Adaptive Attack Persona Agent

June 11, 2024

Knowledge Return Oriented Prompting (KROP)
Jason Martin, Kenneth Yeung
Large Language Model Prompt Injection Attack Prompt Injection LLM App

June 5, 2024

Ranking Manipulation for Conversational Search Engines
Samuel Pfrommer, Yatong Bai, Tanmay Gautam, Somayeh Sojoudi
Conversational Search Prompt Injection Attack Adversarial Pattern

May 31, 2024

Exfiltration of personal information from ChatGPT via prompt injection
Gregory Schwartzman
ChatGPT Generated Conversation User Base Latent Vulnerability Prompt Injection Attack Personal Information Prompt Injection Data Exfiltration

April 6, 2024

Goal-guided Generative Prompt Injection Attack on Large Language Models
Chong Zhang, Mingyu Jin, Qinkai Yu, Chengzhi Liu, Haochen Xue, Xiaobo Jin
Adversarial Text Adversarial Prompt Prompt Injection Attack QuEry Based Attack

April 5, 2024

Fine-Tuning, Quantization, and LLMs: Navigating Unintended Outcomes
Divyanshu Kumar, Anurakt Kumar, Sahil Agarwal, Prashanth Harshangi
Large Language Model Fine Tuning Quantization Operator LLM Safety Prompt Injection Attack

March 26, 2024

Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Jiawen Shi, Zenghui Yuan, Yinuo Liu, Yue Huang, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong
Prompt Injection Attack Adversarial Pattern LLM a a Judge

March 20, 2024

Defending Against Indirect Prompt Injection Attacks With Spotlighting
Keegan Hines, Gary Lopez, Matthew Hall, Federico Zarfati, Yonatan Zunger, Emre Kiciman
NLP Task Prompt Injection Attack Robust Defense Adversarial Visual Instruction

March 14, 2024

Scaling Behavior of Machine Translation with Large Language Models under Prompt Injection Attacks
Zhifan Sun, Antonio Valerio Miceli-Barone
Machine Translation Language Pair Task Specific Model Prompt Injection Attack Scaling Behavior

March 7, 2024

Automatic and Universal Prompt Injection Attacks against Large Language Models
Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, Chaowei Xiao
Gradient Based Prompt Attack Prompt Injection Attack Prompt Injection

March 6, 2024

Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks
Dario Pasquini, Martin Strohmeier, Carmela Troncoso
LeArning Abstract Prompt Injection Attack Execution Time

March 5, 2024

InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Qiusi Zhan, Zhixiang Liang, Zifan Ying, Daniel Kang
LLM Based Prompt Injection Attack Prompt Injection

February 15, 2024

AbuseGPT: Abuse of Generative AI ChatBots to Create Smishing Campaigns
Ashfak Md Shibli, Mir Mehedi A. Pritom, Maanak Gupta
Generative AI Prompt Injection Attack AI Chatbots

January 31, 2024

An Early Categorization of Prompt Injection Attacks on Large Language Models
Sippo Rossi, Alisia Marianne Michel, Raghava Rao Mukkamala, Jason Bennett Thatcher
Large Language Model Security Vulnerability Prompt Injection Attack AI Chatbots Prompt Injection Early Classification

January 15, 2024

Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications
Xuchen Suo
Novel Approach Prompt Injection Attack Prompt Based Method LLM Integrated Application Artificial Intelligence Security

December 29, 2023

Jatmo: Prompt Injection Defense by Task-Specific Finetuning
Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, David Wagner
Instruction Tuning Task Specific Instruction Following Task Specific Model Prompt Injection Attack Prompt Injection

December 21, 2023

Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, Fangzhao Wu
Black Box Benchmark Platform White Box Prompt Injection Attack Black Box Defense