Safety Attack

Safety attacks target the vulnerabilities of artificial intelligence systems, aiming to compromise their intended safe operation or extract sensitive information. Current research focuses on attacks against large language models (LLMs) during federated learning, diffusion models used in text-to-image generation, and embedded neural networks in cyber-physical systems. These attacks leverage adversarial inputs, fault injection, or manipulation of training data to achieve their objectives, highlighting the need for robust safety mechanisms and mitigation strategies. The significance of this research lies in ensuring the reliable and trustworthy deployment of AI across various applications, particularly in safety-critical domains.

Papers