Prompt Injection Attack
Prompt injection attacks exploit the vulnerability of large language models (LLMs) to malicious instructions embedded within user prompts, causing the models to deviate from their intended function. Current research focuses on developing and benchmarking these attacks across various LLM architectures, including those used in machine translation, robotic systems, and conversational search engines, and exploring both black-box and white-box defense mechanisms such as prompt engineering, fine-tuning, and input/output filtering. The widespread adoption of LLMs necessitates a thorough understanding of these attacks and the development of robust defenses to mitigate significant security risks in numerous applications.
Papers
FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks
Jiongxiao Wang, Fangzhou Wu, Wendi Li, Jinsheng Pan, Edward Suh, Z. Morley Mao, Muhao Chen, Chaowei Xiao
Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures
Victoria Benjamin, Emily Braca, Israel Carter, Hafsa Kanchwala, Nava Khojasteh, Charly Landow, Yi Luo, Caroline Ma, Anna Magarelli, Rachel Mirin, Avery Moyer, Kayla Simpson, Amelia Skawinski, Thomas Heverin
Palisade -- Prompt Injection Detection Framework
Sahasra Kokkula, Somanathan R, Nandavardhan R, Aashishkumar, G Divya
Fine-tuned Large Language Models (LLMs): Improved Prompt Injection Attacks Detection
Md Abdur Rahman, Fan Wu, Alfredo Cuzzocrea, Sheikh Iqbal Ahamed