Attribution Attack
Attribution attacks target the explainability methods used to understand deep learning models, aiming to manipulate these explanations without significantly altering the model's predictions. Current research focuses on improving the accuracy of neuron importance estimations within these attacks, developing more transferable attacks across different model architectures, and exploring vulnerabilities in secure aggregation protocols used in federated learning. These attacks highlight the fragility of model interpretability and pose significant security risks, particularly in sensitive applications where trust in model explanations is crucial, driving efforts to develop more robust explanation methods and defensive strategies.
Papers
May 16, 2024
October 16, 2023
June 13, 2023
March 1, 2023
November 9, 2022
March 31, 2022