Attack Success Rate

Attack success rate (ASR) quantifies the effectiveness of adversarial attacks against machine learning models, focusing on compromising their security and reliability. Current research investigates ASR across various model types, including large language models (LLMs), federated learning systems, and text-to-image generators, employing diverse attack methods like gradient-based optimization, backdoor insertion, and prompt engineering. Understanding and improving ASR is crucial for developing robust and secure AI systems, impacting both the theoretical foundations of machine learning and the practical deployment of AI in sensitive applications. The field is actively exploring both improved attack strategies and more effective defenses.

Papers

December 19, 2023

Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
Jason Vega, Isha Chaudhary, Changming Xu, Gagandeep Singh
Attack Success Rate Open Source LLM Text Attack

November 23, 2023

November 15, 2023

Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
Zhexin Zhang, Junxiao Yang, Pei Ke, Fei Mi, Hongning Wang, Minlie Huang
Jailbreak Attack New Attack Attack Success Rate Priority Based Inference Stage

November 14, 2023

$DA^3$: A Distribution-Aware Adversarial Attack against Language Models
Yibo Wang, Xiangjue Dong, James Caverlee, Philip S. Yu
Language Model Adversarial Attack Adversarial Example Attack Success Rate LoRA Based

October 28, 2023

Large Language Models Are Better Adversaries: Exploring Generative Clean-Label Backdoor Attacks Against Text Classifiers
Wencong You, Zayd Hammoudeh, Daniel Lowd
Adversarial Example Backdoor Attack Attack Success Rate Adversary Agent Text Classifier Textual Backdoor Attack Clean Label Backdoor Attack Clean Label Attack

October 26, 2023

SoK: Pitfalls in Evaluating Black-Box Attacks
Fnu Suya, Anshuman Suri, Tingwei Zhang, Jingtao Hong, Yuan Tian, David Evans
Common Pitfall Attack Success Rate Black Box Attack Extraction Attack

October 23, 2023

AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models
Sicheng Zhu, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, Tong Sun
Adversarial Attack Jailbreak Attack Attack Success Rate Gradient Based Adversarial

July 21, 2023

Unveiling Vulnerabilities in Interpretable Deep Learning Systems with Query-Efficient Black-box Attacks
Eldor Abdukhamidov, Mohammed Abuhamad, Simon S. Woo, Eric Chan-Tin, Tamer Abuhmed
Adversarial Attack Adversarial Example New Attack Attack Success Rate Interpretable Deep Learning Unveiling Vulnerability

May 25, 2023

IMBERT: Making BERT Immune to Insertion-based Backdoor Attacks
Xuanli He, Jun Wang, Benjamin Rubinstein, Trevor Cohn
Backdoor Attack Attack Success Rate Insertion Based

April 21, 2023

INK: Inheritable Natural Backdoor Attack Against Model Distillation
Xiaolei Liu, Ming Yi, Kangyi Ding, Bangzhou Xin, Yixiao Xu, Li Yan, Chao Shen
Backdoor Attack Attack Success Rate Robust Backdoor Attack

April 10, 2023

Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence
Hanbin Hong, Xinyu Zhang, Binghui Wang, Zhongjie Ba, Yuan Hong
Adversarial Attack Adversarial Example Attack Success Rate

February 20, 2023

Efficient Algorithms for Boundary Defense with Heterogeneous Defenders
Si Wei Feng, Jingjin Yu
Efficient Algorithm Cyber Attack Attack Success Rate Multiple Defender Multiple Attack Heterogeneous Zero Day Attack Stochastic Defense

December 20, 2022

Flareon: Stealthy any2any Backdoor Injection via Poisoned Augmentation
Tianrui Qin, Xianghuan He, Xitong Gao, Yiren Zhao, Kejiang Ye, Cheng-Zhong Xu
Backdoor Attack Supply Chain Attack Success Rate Backdoor Injection

October 28, 2022

October 23, 2022

FLIP: A Provable Defense Framework for Backdoor Mitigation in Federated Learning
Kaiyuan Zhang, Guanhong Tao, Qiuling Xu, Siyuan Cheng, Shengwei An, Yingqi Liu, Shiwei Feng, Guangyu Shen, Pin-Yu Chen, Shiqing Ma, Xiangyu Zhang
Federated Learning Backdoor Attack Attack Success Rate Robust Federated Learning Negative Flip Provable Defense

October 12, 2022

Few-shot Backdoor Attacks via Neural Tangent Kernels
Jonathan Hayase, Sewoong Oh
Deep Neural Network Backdoor Attack Neural Tangent Kernel Poisoning Attack Attack Success Rate Shot Backdoor

September 28, 2022

Securing Federated Learning against Overwhelming Collusive Attackers
Priyesh Ranjan, Ashish Gupta, Federico Corò, Sajal K. Das
Federated Learning Privacy Preserving Attack Success Rate Malicious Code Densest Subgraph Collaborative Model

August 26, 2022

ATTRITION: Attacking Static Hardware Trojan Detection Techniques Using Reinforcement Learning
Vasudev Gohil, Hao Guo, Satwik Patnaik, Jeyavijayan, Rajendran
Reinforcement Learning Attack Success Rate Hardware Trojan Attack Framework Hardware Trojan Detection Employee Attrition