Adversarial Attack
Adversarial attacks aim to deceive machine learning models by subtly altering input data, causing misclassifications or other erroneous outputs. Current research focuses on developing more robust models and detection methods, exploring various attack strategies across different model architectures (including vision transformers, recurrent neural networks, and graph neural networks) and data types (images, text, signals, and tabular data). Understanding and mitigating these attacks is crucial for ensuring the reliability and security of AI systems in diverse applications, from autonomous vehicles to medical diagnosis and cybersecurity.
Papers
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, Matt Fredrikson
NSA: Naturalistic Support Artifact to Boost Network Confidence
Abhijith Sharma, Phil Munz, Apurva Narayan
When Measures are Unreliable: Imperceptible Adversarial Perturbations toward Top-$k$ Multi-Label Learning
Yuchen Sun, Qianqian Xu, Zitai Wang, Qingming Huang
Unified Adversarial Patch for Visible-Infrared Cross-modal Attacks in the Physical World
Xingxing Wei, Yao Huang, Yitong Sun, Jie Yu
On the unreasonable vulnerability of transformers for image restoration -- and an easy fix
Shashank Agnihotri, Kanchana Vaishnavi Gandikota, Julia Grabinski, Paramanand Chandramouli, Margret Keuper
Imperceptible Physical Attack against Face Recognition Systems via LED Illumination Modulation
Junbin Fang, Canjian Jiang, You Jiang, Puxi Lin, Zhaojie Chen, Yujing Sun, Siu-Ming Yiu, Zoe L. Jiang
Why Don't You Clean Your Glasses? Perception Attacks with Dynamic Optical Perturbations
Yi Han, Matthew Chan, Eric Wengrowski, Zhuohuan Li, Nils Ole Tippenhauer, Mani Srivastava, Saman Zonouz, Luis Garcia
An Estimator for the Sensitivity to Perturbations of Deep Neural Networks
Naman Maheshwari, Nicholas Malaya, Scott Moe, Jaydeep P. Kulkarni, Sudhanva Gurumurthi
Lost In Translation: Generating Adversarial Examples Robust to Round-Trip Translation
Neel Bhandari, Pin-Yu Chen
AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models
Xuelong Dai, Kaisheng Liang, Bin Xiao
Unveiling Vulnerabilities in Interpretable Deep Learning Systems with Query-Efficient Black-box Attacks
Eldor Abdukhamidov, Mohammed Abuhamad, Simon S. Woo, Eric Chan-Tin, Tamer Abuhmed
Robust Feature Inference: A Test-time Defense Strategy using Spectral Projections
Anurag Singh, Mahalakshmi Sabanayagam, Krikamol Muandet, Debarghya Ghoshdastidar
Adversarial attacks for mixtures of classifiers
Lucas Gnecco Heredia, Benjamin Negrevergne, Yann Chevaleyre
A Holistic Assessment of the Reliability of Machine Learning Systems
Anthony Corso, David Karamadian, Romeo Valentin, Mary Cooper, Mykel J. Kochenderfer
FACADE: A Framework for Adversarial Circuit Anomaly Detection and Evaluation
Dhruv Pai, Andres Carranza, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo