Attack Success Rate

Attack success rate (ASR) quantifies the effectiveness of adversarial attacks against machine learning models, focusing on compromising their security and reliability. Current research investigates ASR across various model types, including large language models (LLMs), federated learning systems, and text-to-image generators, employing diverse attack methods like gradient-based optimization, backdoor insertion, and prompt engineering. Understanding and improving ASR is crucial for developing robust and secure AI systems, impacting both the theoretical foundations of machine learning and the practical deployment of AI in sensitive applications. The field is actively exploring both improved attack strategies and more effective defenses.

Papers

August 18, 2024

Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Kexin Chen, Yi Liu, Dongxia Wang, Jiaying Chen, Wenhai Wang
Large Language Model Jailbreak Attack New Characterization Attack Success Rate Security Evaluation

July 18, 2024

Krait: A Backdoor Attack Against Graph Prompt Tuning
Ying Song, Rita Singh, Balaji Palanisamy
Backdoor Attack Prompt Tuning Attack Success Rate Trigger Attack Graph Prompt

July 15, 2024

Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks
Quang H. Nguyen, Nguyen Ngoc-Hieu, The-Anh Ta, Thanh Nguyen-Tang, Kok-Seng Wong, Hoang Thanh-Tung, Khoa D. Doan
Adversarial Attack Backdoor Attack Attack Success Rate Clean Label Backdoor Attack Clean Label Attack

July 2, 2024

Looking From the Future: Multi-order Iterations Can Enhance Adversarial Attack Transferability
Zijian Ying, Qianmu Li, Tao Wang, Zhichao Lian, Shunmei Meng, Xuyun Zhang
Future Reasoning Attack Success Rate Transferable Attack Adversarial Transferability Adversarial Perspective

June 18, 2024

Attack and Defense of Deep Learning Models in the Field of Web Attack Detection
Lijia Shi, Shihao Dong
Deep Learning Model Backdoor Attack Backdoor Defense Limited Field Attack Success Rate Unknown Attack Web Attack

June 17, 2024

Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming
Vernon Toh Yan Han, Rishabh Bhardwaj, Soujanya Poria
Memory Trace Red Teaming Attack Success Rate Memory Stability Quality Diversity Search Rainbow Teaming

June 4, 2024

QROA: A Black-Box Query-Response Optimization Attack on LLMs
Hussein Jawad, Nicolas J. -B. BRUNEL
Large Language Model Black Box Optimization Deep Q Learning Attack Success Rate QuEry Based Attack

May 31, 2024

Enhancing Jailbreak Attack Against Large Language Models through Silent Tokens
Jiahao Yu, Haozheng Luo, Jerry Yao-Chieh Hu, Wenbo Guo, Han Liu, Xinyu Xing
Language Model Jailbreak Attack Attack Success Rate

May 29, 2024

DiveR-CT: Diversity-enhanced Red Teaming Large Language Model Assistants with Relaxing Constraints
Andrew Zhao, Quentin Xu, Matthieu Lin, Shenzhi Wang, Yong-jin Liu, Zilong Zheng, Gao Huang
Red Teaming Attack Success Rate DIVeR Identification Semantic Diversity Semantic Reward Constraint Relaxation

May 24, 2024

TrojanForge: Generating Adversarial Hardware Trojan Examples Using Reinforcement Learning
Amin Sarihi, Peter Jamieson, Ahmad Patooghy, Abdel-Hameed A. Badawy
Reinforcement Learning Generative Adversarial Network Adversarial Example Attack Success Rate Hardware Trojan Adversarial Malware

May 19, 2024

An Invisible Backdoor Attack Based On Semantic Feature
Yangming Chen
Deep Neural Network Backdoor Attack Backdoor Defense Semantic Feature Attack Success Rate Invisible Backdoor Attack

May 10, 2024

Improving Transferable Targeted Adversarial Attack via Normalized Logit Calibration and Truncated Feature Mixing
Juanjuan Weng, Zhiming Luo, Shaozi Li
Task Transferability Adversarial Sample Attack Success Rate Transferable Adversarial Attack Feature Mixing Logit Calibration Source Training

April 26, 2024

Beyond Traditional Threats: A Persistent Backdoor Attack on Federated Learning
Tao Liu, Yuhang Zhang, Zhu Feng, Zhiqin Yang, Chen Xu, Dapeng Man, Wu Yang
Backdoor Attack Attack Success Rate Backdoor Effect

April 22, 2024

Typos that Broke the RAG's Back: Genetic Attack on RAG Pipeline by Simulating Documents in the Wild via Low-level Perturbations
Sukmin Cho, Soyeong Jeong, Jeongyeon Seo, Taeho Hwang, Jong C. Park
Native Robustness Retrieval Augmented Generation Attack Success Rate Recent Large Language Model Kg Rag RAG Based Document Generation Typo Squatting

April 18, 2024

Uncovering Safety Risks of Large Language Models through Concept Activation Vector
Zhihao Xu, Ruixuan Huang, Changyu Chen, Xiting Wang
Attack Success Rate Attack Method Concept Activation Vector Invisible Attack

April 15, 2024

On the Efficiency of Privacy Attacks in Federated Learning
Nawrin Tabassum, Ka-Ho Chow, Xuyu Wang, Wenbin Zhang, Yanzhao Wu
High Efficiency Privacy Attack Attack Success Rate Gradient Leakage Attack

April 3, 2024

Exploring Backdoor Vulnerabilities of Chat Models
Yunzhuo Hao, Wenkai Yang, Yankai Lin
Backdoor Attack Backdoor Trigger Attack Success Rate Chat Model

March 18, 2024

Impart: An Imperceptible and Effective Label-Specific Backdoor Attack
Jingke Zhao, Zan Wang, Yongwei Wang, Lanjun Wang
Backdoor Attack Attack Success Rate Clean Label Backdoor Attack Imperceptible Pattern Label Attack

February 23, 2024

Fast Adversarial Attacks on Language Models In One GPU Minute
Vinu Sankar Sadasivan, Shoumik Saha, Gaurang Sriramanan, Priyatham Kattakinda, Atoosa Chegini, Soheil Feizi
Language Model Membership Inference Attack Adversarial Prompt Attack Success Rate Adversarial Search

February 12, 2024

OrderBkd: Textual backdoor attack through repositioning
Irina Alekseevskaia, Konstantin Arkhipenko
Attack Success Rate Textual Backdoor Attack