Attribution Method

Attribution methods in explainable AI aim to decipher how machine learning models arrive at their predictions by assigning importance scores to input features. Current research focuses on improving the faithfulness and efficiency of these methods across diverse model architectures, including convolutional neural networks, transformers, and large language models, often employing techniques like gradient-based approaches, perturbation tests, and counterfactual generation. This work is crucial for enhancing the trustworthiness and interpretability of complex models, particularly in high-stakes applications where understanding model decisions is paramount, and for identifying and mitigating biases or vulnerabilities.

Papers

August 17, 2023

A Dual-Perspective Approach to Evaluating Feature Attribution Methods
Yawei Li, Yang Zhang, Kenji Kawaguchi, Ashkan Khakzar, Bernd Bischl, Mina Rezaei
Feature Attribution Attribution Method Feature Attribution Method Faithfulness Test Dual Approach Predictive Feature

July 29, 2023

Explaining Full-disk Deep Learning Model for Solar Flare Prediction using Attribution Methods
Chetraj Pandey, Rafal A. Angryk, Berkay Aydin
Neural Network Attribution Method Solar Flare Solar Flare Prediction

July 18, 2023

Saliency strikes back: How filtering out high frequencies improves white-box explanations
Sabine Muzellec, Thomas Fel, Victor Boutin, Léo andéol, Rufin VanRullen, Thomas Serre
High Explainability Explainability Method Attribution Method Human Saliency White Box Dominant Low Frequency

July 12, 2023

Stability Guarantees for Feature Attributions with Multiplicative Smoothing
Anton Xue, Rajeev Alur, Eric Wong
Feature Attribution Attribution Method Feature Attribution Method Stability Guarantee

July 6, 2023

A Vulnerability of Attribution Methods Using Pre-Softmax Scores
Miguel Lerma, Mirtha Lucas
Convolutional Neural Network Adversarial Attack Attribution Method Latent Vulnerability Imperceptible Perturbation

July 4, 2023

Shapley Sets: Feature Attribution via Recursive Function Decomposition
Torty Sivill, Peter Flach
Shapley Value Feature Attribution Attribution Method Function Decomposition Shapley Value Feature

June 27, 2023

An Empirical Evaluation of the Rashomon Effect in Explainable Machine Learning
Sebastian Müller, Vanessa Toborek, Katharina Beckh, Matthias Jakobs, Christian Bauckhage, Pascal Welke
Line by Line Explanation Hyperparameter Tuning Attribution Method Explainable Machine Learning Empirical Evaluation Rashomon Ratio

June 23, 2023

Four Axiomatic Characterizations of the Integrated Gradients Attribution Method
Daniel Lundstrom, Meisam Razaviyayn
Deep Neural Network Machine Learning Model Gradient Based Attribution Method Axiomatic Approach Integrated Gradient

June 22, 2023

Pre or Post-Softmax Scores in Gradient-based Attribution Methods, What is Best?
Miguel Lerma, Mirtha Lucas
Natural Gradient Attribution Method Pre Print

May 24, 2023

Assessment of the Reliablity of a Model's Decision by Generalizing Attribution to the Wavelet Domain
Gabriel Kasmi, Laurent Dubus, Yves-Marie Saint Drenan, Philippe Blanc
Full Model Attribution Method Decision Relevant Information Source Attribution Wavelet Transform Wavelet Domain Multi Scale Wavelet

May 16, 2023

The Weighted M\"obius Score: A Unified Framework for Feature Attribution
Yifan Jiang, Shane Steinert-Threlkeld
Unified Framework Feature Attribution Attribution Method Score Matching Feature Interaction Attribution Evaluation

April 26, 2023

On Pitfalls of $\textit{RemOve-And-Retrain}$: Data Processing Inequality Perspective
Junhwa Song, Keumgang Cha, Junghoon Seo
Feature Importance Common Pitfall Attribution Method Model Retraining

March 24, 2023

TRAK: Attributing Model Behavior at Scale
Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, Aleksander Madry
Language Model Visual Analogue Scale Attribution Method Model Behavior Data Attribution

March 21, 2023

Better Understanding Differences in Attribution Methods via Systematic Evaluations
Sukrut Rao, Moritz Böhle, Bernt Schiele
Deep Neural Network Attribution Method Ground Truth Annotation Post Hoc Attribution Unraveling Learning Difference

March 15, 2023

EvalAttAI: A Holistic Approach to Evaluating Attribution Maps in Robust and Non-Robust Models
Ian E. Nielsen, Ravi P. Ramachandran, Nidhal Bouaynaya, Hassan M. Fathallah-Shaykh, Ghulam Rasool
Robust Version Attribution Method Robust Model Holistic Approach Attribution Map

March 2, 2023

Understanding and Unifying Fourteen Attribution Methods with Taylor Interactions
Huiqi Deng, Na Zou, Mengnan Du, Weifu Chen, Guocan Feng, Ziwei Yang, Zheyang Li, Quanshi Zhang
Deep Neural Network Human Understanding Attribution Method Source Attribution Attribution Score Shapley Interaction

March 1, 2023

A Practical Upper Bound for the Worst-Case Attribution Deviations
Fan Wang, Adams Wai-Kin Kong
Attribution Method Upper Bound Model Attribution Attribution Attack

February 13, 2023

A Graphical Point Process Framework for Understanding Removal Effects in Multi-Touch Attribution
Jun Tao, Qian Chen, James W. Snyder, Arava Sai Kumar, Amirhossein Meisami, Lingzhou Xue
Attribution Method Active Removal Attribution Score Customer Behavior Multi Touch Attribution

January 5, 2023

Semantic match: Debugging feature attribution methods in XAI for healthcare
Giovanni Cinà, Tabea E. Röber, Rob Goedhart, Ş. İlker Birbil
Artificial Intelligence Healthcare System xAI Community Attribution Method Semantic Matching Feature Attribution Method Post Hoc Explainability Medical AI Algorithm

December 12, 2022

Utilizing Mutations to Evaluate Interpretability of Neural Networks on Genomic Data
Utku Ozbulak, Solha Kang, Jasper Zuallaert, Stephen Depuydt, Joris Vankerschaver
Neural Network Inherent Interpretability Attribution Method Attribution Map Short Gene