New Attack
Research on attacks against large language models (LLMs) and related AI systems is rapidly expanding, focusing on vulnerabilities exploited to elicit harmful outputs or extract sensitive information. Current efforts concentrate on developing and evaluating various attack methods, including jailbreaking, data poisoning, prompt injection, and membership inference attacks, often targeting specific model architectures like transformer-based LLMs and diffusion models. This research is crucial for understanding and mitigating the risks associated with increasingly powerful AI systems, informing the development of more robust and trustworthy AI applications across diverse sectors.
Papers
May 29, 2023
May 18, 2023
May 3, 2023
April 27, 2023
April 21, 2023
April 9, 2023
March 7, 2023
February 19, 2023
December 21, 2022
December 18, 2022
December 15, 2022
December 11, 2022
November 3, 2022
October 30, 2022
October 26, 2022
October 17, 2022
October 7, 2022
September 28, 2022
September 27, 2022