Prompt Attack
Prompt attacks exploit vulnerabilities in large language models (LLMs) and other AI systems by manipulating input prompts to elicit undesired or harmful outputs, compromising data integrity and user safety. Current research focuses on developing both novel attack methods, often leveraging gradient-based optimization or generative models to create effective adversarial prompts, and robust defense mechanisms, including improved prompt filtering and model architectures designed to resist manipulation. Understanding and mitigating these attacks is crucial for ensuring the safe and reliable deployment of LLMs across various applications, impacting both the security of AI systems and the trustworthiness of AI-driven services.
Papers
February 23, 2024
February 20, 2024
February 19, 2024
February 8, 2024
November 26, 2023
November 19, 2023
November 2, 2023
October 24, 2023
October 22, 2023
October 20, 2023
October 19, 2023
October 11, 2023
October 4, 2023
September 29, 2023
September 25, 2023
September 21, 2023
August 7, 2023
July 31, 2023
June 7, 2023