Attribution Method

Attribution methods in explainable AI aim to decipher how machine learning models arrive at their predictions by assigning importance scores to input features. Current research focuses on improving the faithfulness and efficiency of these methods across diverse model architectures, including convolutional neural networks, transformers, and large language models, often employing techniques like gradient-based approaches, perturbation tests, and counterfactual generation. This work is crucial for enhancing the trustworthiness and interpretability of complex models, particularly in high-stakes applications where understanding model decisions is paramount, and for identifying and mitigating biases or vulnerabilities.

Papers

May 2, 2024

Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods
Peiyu Yang, Naveed Akhtar, Jiantong Jiang, Ajmal Mian
Comprehensive Evaluation Attribution Method Attribution Model

April 29, 2024

Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods
Haeun Yu, Pepa Atanasova, Isabelle Augenstein
Language Model Unified Framework Attribution Method Human Instance Parametric Knowledge Neuron Attribution

April 22, 2024

Integrated Gradient Correlation: a Dataset-wise Attribution Method
Pierre Lelièvre, Chien-Chung Chen
Receptive Field Attribution Method Attribution Score Integrated Gradient

March 27, 2024

Improving Attributed Text Generation of Large Language Models via Preference Learning
Dongfang Li, Zetian Sun, Baotian Hu, Zhenyu Liu, Xinshuo Hu, Xuebo Liu, Min Zhang
Large Language Model Text Generation Preference Learning Attribution Method Attribution Score Software Citation Attribution Analysis

March 21, 2024

Multi-Level Explanations for Generative Language Models
Lucas Monteiro Paes, Dennis Wei, Hyo Jin Do, Hendrik Strobelt, Ronny Luss, Amit Dhurandhar, Manish Nagireddy, Karthikeyan Natesan Ramamurthy, Prasanna Sattigeri, Werner Geyer, Soumya Ghosh
Text Classification Generative Language Model Attribution Method Attribution Robustness Level Explanation Perturbation Based

March 11, 2024

Explainable Learning with Gaussian Processes
Kurt Butler, Guanchao Feng, Petar M. Djuric
Gaussian Process Gaussian Process Regression Feature Attribution Attribution Method

February 14, 2024

Less is More: Fewer Interpretable Region via Submodular Subset Selection
Ruoyu Chen, Hua Zhang, Siyuan Liang, Jingzhi Li, Xiaochun Cao
Submodular Maximization Attribution Method Attribution Score Image Attribution Region Identification

February 13, 2024

Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation
Xuexin Chen, Ruichu Cai, Zhengting Huang, Yuxuan Zhu, Julien Horwood, Zhifeng Hao, Zijian Li, Jose Miguel Hernandez-Lobato
High Explainability Community Need Feature Attribution Attribution Method Causal Direction Counterfactual Reasoning Feature Attribution Method

February 5, 2024

Approximate Attributions for Off-the-Shelf Siamese Transformers
Lucas Möller, Dmitry Nikolaev, Sebastian Padó
Attribution Method Attribution Score Siamese Transformer Siamese Encoders Attribution Model

February 1, 2024

ReAGent: A Model-agnostic Feature Attribution Method for Generative Language Models
Zhixue Zhao, Boxuan Shan
Generative Language Model Attribution Method Token Prediction Generative Learning Token Importance Model Agnostic Feature Attribution

January 28, 2024

Provably Stable Feature Rankings with SHAP and LIME
Jeremy Goldwasser, Giles Hooker
Shapley Value Feature Attribution Attribution Method Fusion SHAP Feature Ranking Dseg Lime

January 1, 2024

On Discprecncies between Perturbation Evaluations of Graph Neural Network Attributions
Razieh Rezaei, Alireza Dizaji, Ashkan Khakzar, Anees Kazi, Nassir Navab, Daniel Rueckert
Attribution Method Graph Domain Graph Classification Datasets Graph Neural Network Attribution

December 29, 2023

Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training
Dongfang Li, Baotian Hu, Qingcai Chen, Shan He
Text Classification Feature Attribution Attribution Method Better Robustness Faithful Explanation Explanation Based

December 21, 2023

MFABA: A More Faithful and Accelerated Boundary-based Attribution Method for Deep Neural Networks
Zhiyu Zhu, Huaming Chen, Jiayu Zhang, Xinyi Wang, Zhibo Jin, Minhui Xue, Dongxiao Zhu, Kim-Kwang Raymond Choo
Deep Neural Network Gradient Based Attribution Method Attribution Analysis

December 16, 2023

Rethinking Robustness of Model Attributions
Sandesh Kamath, Sankalp Mittal, Amit Deshpande, Vineeth N Balasubramanian
Feature Attribution Attribution Method Model Attribution Attribution Robustness Natural Robustness Research

November 7, 2023

A Survey of Large Language Models Attribution
Dongfang Li, Zetian Sun, Xinshuo Hu, Zhenyu Liu, Ziyang Chen, Baotian Hu, Aiguo Wu, Min Zhang
Timely Survey Attribution Method Generative System Generative Search Engine Open Domain Generative Language Model Attribution

October 14, 2023

Scene Text Recognition Models Explainability Using Local Features
Mark Vincent Ty, Rowel Atienza
Explainable AI XAI Method Attribution Method Local Feature charactER Representation Scene Text Recognition Model

October 10, 2023

AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments
Yang Zhang, Yawei Li, Hannah Brown, Mina Rezaei, Bernd Bischl, Philip Torr, Ashkan Khakzar, Kenji Kawaguchi
Feature Attribution Attribution Method Source Attribution Continuous Environment Neural Network Output

October 9, 2023

An Attribution Method for Siamese Encoders
Lucas Möller, Dmitry Nikolaev, Sebastian Padó
Feature Attribution Attribution Method Siamese Encoders Local Attribution

August 18, 2023

On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box
Yi Cai, Gerhard Wunder
Black Box Gradient Based Feature Attribution Attribution Method White Box Gradient Based Explanation

Attribution Method

Papers

Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods

Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods

Integrated Gradient Correlation: a Dataset-wise Attribution Method

Improving Attributed Text Generation of Large Language Models via Preference Learning

Multi-Level Explanations for Generative Language Models

Explainable Learning with Gaussian Processes

Less is More: Fewer Interpretable Region via Submodular Subset Selection

Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation

Approximate Attributions for Off-the-Shelf Siamese Transformers

ReAGent: A Model-agnostic Feature Attribution Method for Generative Language Models

Provably Stable Feature Rankings with SHAP and LIME

On Discprecncies between Perturbation Evaluations of Graph Neural Network Attributions

Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training

MFABA: A More Faithful and Accelerated Boundary-based Attribution Method for Deep Neural Networks

Rethinking Robustness of Model Attributions

A Survey of Large Language Models Attribution

Scene Text Recognition Models Explainability Using Local Features

AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments

An Attribution Method for Siamese Encoders

On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box