Prompt Attack
Prompt attacks exploit vulnerabilities in large language models (LLMs) and other AI systems by manipulating input prompts to elicit undesired or harmful outputs, compromising data integrity and user safety. Current research focuses on developing both novel attack methods, often leveraging gradient-based optimization or generative models to create effective adversarial prompts, and robust defense mechanisms, including improved prompt filtering and model architectures designed to resist manipulation. Understanding and mitigating these attacks is crucial for ensuring the safe and reliable deployment of LLMs across various applications, impacting both the security of AI systems and the trustworthiness of AI-driven services.
Papers
November 14, 2024
October 30, 2024
October 16, 2024
October 15, 2024
September 26, 2024
August 20, 2024
July 12, 2024
July 8, 2024
June 25, 2024
June 13, 2024
June 10, 2024
June 1, 2024
May 30, 2024
May 28, 2024
May 18, 2024
April 22, 2024
April 3, 2024
March 7, 2024