Attack Framework

Attack frameworks encompass the design and implementation of methods to probe vulnerabilities in various machine learning models and systems, aiming to assess their robustness and identify weaknesses. Current research focuses on developing sophisticated attacks against large language models (LLMs), recommender systems, and image generation models, often employing techniques like adversarial training, generative adversarial networks (GANs), reinforcement learning, and evolutionary strategies to create effective and evasive attacks. These frameworks are crucial for evaluating the security and safety of increasingly prevalent AI systems, informing the development of more robust and reliable models and ultimately contributing to the responsible deployment of AI technologies.

Papers