Adversarial Example
Adversarial examples are subtly altered inputs designed to fool machine learning models, primarily deep neural networks (DNNs), into making incorrect predictions. Current research focuses on improving model robustness against these attacks, exploring techniques like ensemble methods, multi-objective representation learning, and adversarial training, often applied to architectures such as ResNets and Vision Transformers. Understanding and mitigating the threat of adversarial examples is crucial for ensuring the reliability and security of AI systems across diverse applications, from image classification and natural language processing to malware detection and autonomous driving. The development of robust defenses and effective attack detection methods remains a significant area of ongoing investigation.
Papers
DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distribution
Matías P. Pizarro B., Dorothea Kolossa, Asja Fischer
On Evaluating Adversarial Robustness of Large Vision-Language Models
Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Chongxuan Li, Ngai-Man Cheung, Min Lin
How do humans perceive adversarial text? A reality check on the validity and naturalness of word-based adversarial attacks
Salijona Dyrmishi, Salah Ghamizi, Maxime Cordy
Fantastic DNN Classifiers and How to Identify them without Data
Nathaniel Dean, Dilip Sarkar
Introducing Competition to Boost the Transferability of Targeted Adversarial Examples through Clean Feature Mixup
Junyoung Byun, Myung-Joon Kwon, Seungju Cho, Yoonji Kim, Changick Kim