Adversarial Transferability

Adversarial transferability explores how adversarial examples—inputs designed to fool machine learning models—can be generated for one model and successfully used to attack others, even those with different architectures or training data. Current research focuses on improving the effectiveness of these "transfer" attacks across various model types, including convolutional neural networks (CNNs), vision transformers (ViTs), graph neural networks (GNNs), and large language models (LLMs), often employing techniques like input transformations and gradient manipulation to enhance their generalization. Understanding and mitigating adversarial transferability is crucial for ensuring the robustness and security of AI systems in diverse applications, from image recognition to cybersecurity.

Papers