New Attack
Research on attacks against large language models (LLMs) and related AI systems is rapidly expanding, focusing on vulnerabilities exploited to elicit harmful outputs or extract sensitive information. Current efforts concentrate on developing and evaluating various attack methods, including jailbreaking, data poisoning, prompt injection, and membership inference attacks, often targeting specific model architectures like transformer-based LLMs and diffusion models. This research is crucial for understanding and mitigating the risks associated with increasingly powerful AI systems, informing the development of more robust and trustworthy AI applications across diverse sectors.
Papers
September 19, 2023
September 18, 2023
September 11, 2023
September 3, 2023
August 31, 2023
August 28, 2023
August 23, 2023
August 5, 2023
July 26, 2023
July 21, 2023
July 20, 2023
July 17, 2023
July 11, 2023
July 2, 2023
June 30, 2023
June 9, 2023
June 8, 2023
June 5, 2023