Safety Attack

Safety attacks target the vulnerabilities of artificial intelligence systems, aiming to compromise their intended safe operation or extract sensitive information. Current research focuses on attacks against large language models (LLMs) during federated learning, diffusion models used in text-to-image generation, and embedded neural networks in cyber-physical systems. These attacks leverage adversarial inputs, fault injection, or manipulation of training data to achieve their objectives, highlighting the need for robust safety mechanisms and mitigation strategies. The significance of this research lies in ensuring the reliable and trustworthy deployment of AI across various applications, particularly in safety-critical domains.

Papers

November 7, 2024

Attention Masks Help Adversarial Attacks to Bypass Safety Detectors
Yunfan Shi
Adversarial Attack Attention Mask Safety Attack

October 14, 2024

Locking Down the Finetuned LLMs Safety
Minjun Zhu, Linyi Yang, Yifan Wei, Ningyu Zhang, Yue Zhang
Fine Tuning Safety Guarantee Safety Attack

June 15, 2024

Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models
Rui Ye, Jingyi Chai, Xiangrui Liu, Yaodong Yang, Yanfeng Wang, Siheng Chen
Safety Alignment Federated Instruction Tuning Safety Attack

April 2, 2024

Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models
Jiachen Ma, Anda Cao, Zhiqing Xiao, Yijiang Li, Jie Zhang, Chao Ye, Junbo Zhao
Diffusion Model Jailbreak Attack Attack Framework Online Safety Verification Safety Attack

November 13, 2023

KnowSafe: Combined Knowledge and Data Driven Hazard Mitigation in Artificial Pancreas Systems
Xugui Zhou, Maxfield Kouzel, Chloe Smith, Homa Alemzadeh
Cyber Physical System Multiple Knowledge Source Artificial Pancreas Safety Argument Safety Attack

August 31, 2023

Fault Injection and Safe-Error Attack for Extraction of Embedded Neural Network Models
Kevin Hector, Pierre-Alain Moellic, Mathieu Dumont, Jean-Max Dutertre
Process Extraction Model Extraction Attack Fault Injection Attack Vector Embedded Neural Network Safety Attack

Safety Attack

Papers

Attention Masks Help Adversarial Attacks to Bypass Safety Detectors

Locking Down the Finetuned LLMs Safety

Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models

Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models

KnowSafe: Combined Knowledge and Data Driven Hazard Mitigation in Artificial Pancreas Systems

Fault Injection and Safe-Error Attack for Extraction of Embedded Neural Network Models