Adversarial Example
Adversarial examples are subtly altered inputs designed to fool machine learning models, primarily deep neural networks (DNNs), into making incorrect predictions. Current research focuses on improving model robustness against these attacks, exploring techniques like ensemble methods, multi-objective representation learning, and adversarial training, often applied to architectures such as ResNets and Vision Transformers. Understanding and mitigating the threat of adversarial examples is crucial for ensuring the reliability and security of AI systems across diverse applications, from image classification and natural language processing to malware detection and autonomous driving. The development of robust defenses and effective attack detection methods remains a significant area of ongoing investigation.
Papers
Protecting Feed-Forward Networks from Adversarial Attacks Using Predictive Coding
Ehsan Ganjidoost, Jeff Orchard
Noise as a Double-Edged Sword: Reinforcement Learning Exploits Randomized Defenses in Neural Networks
Steve Bakos, Pooria Madani, Heidar Davoudi
Wide Two-Layer Networks can Learn from Adversarial Perturbations
Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki
Teaching a Language Model to Distinguish Between Similar Details using a Small Adversarial Training Set
Chris Achard
CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense
Mingkun Zhang, Keping Bi, Wei Chen, Quanrun Chen, Jiafeng Guo, Xueqi Cheng
One Prompt to Verify Your Models: Black-Box Text-to-Image Models Verification via Non-Transferable Adversarial Attacks
Ji Guo, Wenbo Jiang, Rui Zhang, Guoming Lu, Hongwei Li, Weiren Wu