Black Box Attack
Black-box attacks aim to compromise machine learning models without requiring knowledge of their internal workings, focusing on manipulating inputs to elicit incorrect outputs. Current research emphasizes developing query-efficient methods, often employing zeroth-order optimization, Bayesian optimization, or generative models like diffusion models, to craft adversarial examples for various model architectures, including vision transformers, large language models, and generative adversarial networks. These attacks highlight critical vulnerabilities in deployed systems across diverse applications like image recognition, natural language processing, and even physical security, underscoring the urgent need for more robust and resilient model designs and defense mechanisms.
Papers
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models
Yuxin Wen, Leo Marchyok, Sanghyun Hong, Jonas Geiping, Tom Goldstein, Nicholas Carlini
BadPart: Unified Black-box Adversarial Patch Attacks against Pixel-wise Regression Tasks
Zhiyuan Cheng, Zhaoyi Liu, Tengda Guo, Shiwei Feng, Dongfang Liu, Mingjie Tang, Xiangyu Zhang