Prompt Injection Attack

Prompt injection attacks exploit the vulnerability of large language models (LLMs) to malicious instructions embedded within user prompts, causing the models to deviate from their intended function. Current research focuses on developing and benchmarking these attacks across various LLM architectures, including those used in machine translation, robotic systems, and conversational search engines, and exploring both black-box and white-box defense mechanisms such as prompt engineering, fine-tuning, and input/output filtering. The widespread adoption of LLMs necessitates a thorough understanding of these attacks and the development of robust defenses to mitigate significant security risks in numerous applications.

Papers