Prompt Injection Attack
Prompt injection attacks exploit the vulnerability of large language models (LLMs) to malicious instructions embedded within user prompts, causing the models to deviate from their intended function. Current research focuses on developing and benchmarking these attacks across various LLM architectures, including those used in machine translation, robotic systems, and conversational search engines, and exploring both black-box and white-box defense mechanisms such as prompt engineering, fine-tuning, and input/output filtering. The widespread adoption of LLMs necessitates a thorough understanding of these attacks and the development of robust defenses to mitigate significant security risks in numerous applications.
Papers
July 23, 2024
July 12, 2024
June 20, 2024
June 19, 2024
June 11, 2024
June 5, 2024
May 31, 2024
April 6, 2024
April 5, 2024
March 26, 2024
March 20, 2024
March 14, 2024
March 7, 2024
March 6, 2024
March 5, 2024
February 15, 2024
January 31, 2024
January 15, 2024
December 29, 2023