LLM Attack

LLM attacks focus on exploiting vulnerabilities in large language models (LLMs) to elicit undesirable behaviors, such as generating harmful content or revealing sensitive information. Current research investigates various attack methods, including adversarial prompt engineering, jailbreaking techniques leveraging Markov Decision Processes and tree search algorithms, and data poisoning, with a focus on improving attack stealth and controllability. Understanding and mitigating these attacks is crucial for ensuring the safe and responsible deployment of LLMs across diverse applications, impacting both the security of AI systems and the trustworthiness of AI-generated content.

Papers