Attribution Robustness
Attribution robustness in machine learning focuses on ensuring that explanations of model predictions remain reliable even when the input data is slightly altered. Current research investigates the vulnerability of various attribution methods, including gradient-based and removal-based approaches, across different model architectures like transformers and convolutional neural networks, and explores techniques like adversarial training and smoothing to enhance robustness. This field is crucial for building trust in AI systems, particularly in high-stakes applications, by guaranteeing the fidelity and reliability of model explanations.
Papers
August 14, 2024
May 10, 2024
May 3, 2024
March 21, 2024
December 16, 2023
July 5, 2023
June 12, 2023
April 17, 2023
December 18, 2022
July 8, 2022
July 5, 2022
June 7, 2022