New Attack
Research on attacks against large language models (LLMs) and related AI systems is rapidly expanding, focusing on vulnerabilities exploited to elicit harmful outputs or extract sensitive information. Current efforts concentrate on developing and evaluating various attack methods, including jailbreaking, data poisoning, prompt injection, and membership inference attacks, often targeting specific model architectures like transformer-based LLMs and diffusion models. This research is crucial for understanding and mitigating the risks associated with increasingly powerful AI systems, informing the development of more robust and trustworthy AI applications across diverse sectors.
Papers
October 11, 2024
September 23, 2024
September 12, 2024
August 28, 2024
August 6, 2024
August 1, 2024
July 16, 2024
July 10, 2024
July 6, 2024
June 22, 2024
June 19, 2024
June 12, 2024
June 8, 2024
June 5, 2024
May 31, 2024
May 26, 2024
May 22, 2024
May 20, 2024
May 10, 2024