Adversarial Prompting
Adversarial prompting explores how carefully crafted inputs, or prompts, can manipulate the behavior of large language and vision-language models (LLMs and VLMs), revealing vulnerabilities and biases. Current research focuses on developing both attack methods (e.g., generating prompts to elicit harmful outputs or bypass safety mechanisms) and defense strategies (e.g., creating robust models or detection algorithms). This field is crucial for ensuring the safe and reliable deployment of these powerful models, impacting areas such as AI safety, cybersecurity, and the development of more robust and trustworthy AI systems.
Papers
October 20, 2024
October 5, 2024
August 19, 2024
June 6, 2024
April 21, 2024
April 3, 2024
January 9, 2024
November 20, 2023
November 16, 2023
September 6, 2023
June 20, 2023
May 25, 2023
February 23, 2023
February 8, 2023