Sparse Adversarial Attack

Sparse adversarial attacks aim to deceive deep learning models by making minimal changes to input data, focusing on perturbing only a small subset of features (e.g., pixels in images, words in text). Current research explores efficient algorithms, such as Frank-Wolfe and gradient-based methods incorporating various sparsity-inducing regularizers (e.g., $\ell_0$ norm, group norms), to generate these attacks and improve their effectiveness and interpretability. This research is significant because it reveals vulnerabilities in deep learning models and provides insights into their robustness, impacting both the development of more resilient models and the understanding of their limitations in safety-critical applications.

Papers