Feature Attribution

Feature attribution aims to explain the predictions of complex machine learning models by identifying which input features most significantly influence the output. Current research focuses on developing and evaluating various attribution methods, including gradient-based approaches like Integrated Gradients and game-theoretic methods like SHAP, often applied to deep neural networks (including transformers) and other architectures like Siamese encoders. These efforts address challenges such as faithfulness (accuracy of attributions), robustness (consistency under perturbations), and computational efficiency, ultimately seeking to improve model transparency and trustworthiness for applications ranging from medical diagnosis to scientific discovery.

Papers

June 5, 2023

Time Interpret: a Unified Model Interpretability Library for Time Series
Joseph Enguehard
Time Series Time Matter Pytorch Model Feature Attribution

May 16, 2023

The Weighted M\"obius Score: A Unified Framework for Feature Attribution
Yifan Jiang, Shane Steinert-Threlkeld
Unified Framework Feature Attribution Attribution Method Score Matching Feature Interaction Attribution Evaluation

March 23, 2023

Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective
Avi Schwarzschild, Max Cembalest, Karthik Rao, Keegan Hines, John Dickerson
Feature Attribution Consensus Value Training Objective Feature Attribution Explanation Disagreement Problem Explanation Regularization Consensus Based Explanation

February 16, 2023

January 17, 2023

Negative Flux Aggregation to Estimate Feature Attributions
Xin Li, Deng Pan, Chengyin Li, Yao Qiang, Dongxiao Zhu
Deep Neural Network Deep Learning Architecture Feature Attribution Attribution Map Aggregated Gradient Gradient Aggregation DNN Prediction

December 23, 2022

Relational Local Explanations
Vadim Borisov, Gjergji Kasneci
Feature Attribution Local Explanation Relational Reasoning

December 22, 2022

Impossibility Theorems for Feature Attribution
Blair Bilodeau, Natasha Jaques, Pang Wei Koh, Been Kim
Feature Attribution Interpretability Method Feature Attribution Method Plausible Explanation Model Class Impossibility Theorem

November 29, 2022

Towards More Robust Interpretation via Local Gradient Alignment
Sunghwan Joo, Seokhyeon Jeong, Juyeon Heo, Adrian Weller, Taesup Moon
Feature Attribution Attribution Method Feature Attribution Method Gradient Alignment Robust Interpretation

November 17, 2022

Reducing Hallucinations in Neural Machine Translation with Feature Attribution
Joël Tang, Marina Fomicheva, Lucia Specia
Loss Function Language Generation Neural Machine Translation Feature Attribution Neural Machine Translation Model Parallel Training Reducing Hallucination

November 8, 2022

Individualized and Global Feature Attributions for Gradient Boosted Trees in the Presence of $\ell_2$ Regularization
Qingyao Sun
Feature Attribution Gradient Boosted Tree Model Attribution

October 19, 2022

Gradient Backpropagation based Feature Attribution to Enable Explainable-AI on the Edge
Ashwin Bhat, Adou Sangbone Assoa, Arijit Raychowdhury
Extreme Edge XAI Method Feature Attribution Multi FPGA Inference Accelerator Gradient Backpropagation

September 27, 2022

WeightedSHAP: analyzing and improving Shapley based feature attributions
Yongchan Kwon, James Zou
Shapley Value Feature Attribution Shapley Value Feature Feature Set

September 6, 2022

Change Detection for Local Explainability in Evolving Data Streams
Johannes Haug, Alexander Braun, Stefan Zürn, Gjergji Kasneci
Change Detection Feature Attribution Attribution Method Local Explanation Attribution Based Local Attribution

September 5, 2022

"Is your explanation stable?": A Robustness Evaluation Framework for Feature Attribution
Yuyou Gan, Yuhao Mao, Xuhong Zhang, Shouling Ji, Yuwen Pu, Meng Han, Jianwei Yin, Ting Wang
Adversarial Attack Line by Line Explanation Feature Attribution Robustness Evaluation Framework Explanation Guided

July 15, 2022

June 15, 2022

The Manifold Hypothesis for Gradient-Based Explanations
Sebastian Bordt, Uddeshya Upadhyay, Zeynep Akata, Ulrike von Luxburg
Variational Autoencoders Gradient Based Feature Attribution Manifold Hypothesis Model Gradient Image Manifold

June 14, 2022

Attributions Beyond Neural Networks: The Linear Program Case
Florian Peter Busch, Matej Zečević, Kristian Kersting, Devendra Singh Dhami
Neural Network Feature Attribution Linear Programming Nonlinear Programming Neural Encoding

April 23, 2022

Reinforced Causal Explainer for Graph Neural Networks
Xiang Wang, Yingxin Wu, An Zhang, Fuli Feng, Xiangnan He, Tat-Seng Chua
Graph Neural Network Feature Attribution Subgraph Explanation Causal Explanation