Hard Label Attack

Hard-label attacks target machine learning models by manipulating inputs to elicit incorrect predictions, while only having access to the model's final classification (the "hard label"), not its internal probabilities. Current research focuses on developing efficient algorithms, often employing surrogate models or local explainability techniques, to minimize the number of queries needed to successfully generate adversarial examples in this challenging setting. These attacks are significant because they represent a realistic threat to deployed machine learning systems, particularly in scenarios with limited feedback or privacy constraints, and their study informs the development of more robust models.

Papers