Paper ID: 2205.09518

Gradient Aligned Attacks via a Few Queries

Xiangyuan Yang, Jie Lin, Hanlin Zhang, Xinyu Yang, Peng Zhao

Black-box query attacks, which rely only on the output of the victim model, have proven to be effective in attacking deep learning models. However, existing black-box query attacks show low performance in a novel scenario where only a few queries are allowed. To address this issue, we propose gradient aligned attacks (GAA), which use the gradient aligned losses (GAL) we designed on the surrogate model to estimate the accurate gradient to improve the attack performance on the victim model. Specifically, we propose a gradient aligned mechanism to ensure that the derivatives of the loss function with respect to the logit vector have the same weight coefficients between the surrogate and victim models. Using this mechanism, we transform the cross-entropy (CE) loss and margin loss into gradient aligned forms, i.e. the gradient aligned CE or margin losses. These losses not only improve the attack performance of our gradient aligned attacks in the novel scenario but also increase the query efficiency of existing black-box query attacks. Through theoretical and empirical analysis on the ImageNet database, we demonstrate that our gradient aligned mechanism is effective, and that our gradient aligned attacks can improve the attack performance in the novel scenario by 16.1\% and 31.3\% on the $l_2$ and $l_{\infty}$ norms of the box constraint, respectively, compared to four latest transferable prior-based query attacks. Additionally, the gradient aligned losses also significantly reduce the number of queries required in these transferable prior-based query attacks by a maximum factor of 2.9 times. Overall, our proposed gradient aligned attacks and losses show significant improvements in the attack performance and query efficiency of black-box query attacks, particularly in scenarios where only a few queries are allowed.

Submitted: May 19, 2022