Prompt Injection
Prompt injection attacks exploit the instruction-following capabilities of large language models (LLMs) by subtly embedding malicious instructions within user prompts, causing the model to deviate from its intended function and potentially reveal sensitive information or perform harmful actions. Current research focuses on developing robust testing frameworks (like fuzzing techniques) and defensive strategies, including prompt tuning and task-specific fine-tuning, to mitigate these vulnerabilities. The significance of this research lies in ensuring the secure and reliable deployment of LLMs in real-world applications, particularly those involving sensitive data or critical tasks, by identifying and addressing these emerging security threats.
Papers
Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures
Victoria Benjamin, Emily Braca, Israel Carter, Hafsa Kanchwala, Nava Khojasteh, Charly Landow, Yi Luo, Caroline Ma, Anna Magarelli, Rachel Mirin, Avery Moyer, Kayla Simpson, Amelia Skawinski, Thomas Heverin
Palisade -- Prompt Injection Detection Framework
Sahasra Kokkula, Somanathan R, Nandavardhan R, Aashishkumar, G Divya
Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks
Dario Pasquini, Evgenios M. Kornaropoulos, Giuseppe Ateniese
Fine-tuned Large Language Models (LLMs): Improved Prompt Injection Attacks Detection
Md Abdur Rahman, Fan Wu, Alfredo Cuzzocrea, Sheikh Iqbal Ahamed