Model Interpretability

Model interpretability aims to make the decision-making processes of complex machine learning models transparent and understandable. Current research focuses on developing both inherently interpretable models, such as generalized additive models and rule-based systems, and post-hoc methods that explain the predictions of black-box models, often using techniques like SHAP values, Grad-CAM, and various attention mechanisms applied to architectures like transformers and neural networks. This field is crucial for building trust in AI systems, particularly in high-stakes domains like healthcare and finance, and for facilitating the responsible development and deployment of machine learning technologies.

Papers

July 16, 2024

Feature Inference Attack on Shapley Values
Xinjian Luo, Yangfan Jiang, Xiaokui Xiao
Privacy Preserving Shapley Value Model Interpretability Attribute Inference Attack

July 12, 2024

Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort
Jeeyung Kim, Ze Wang, Qiang Qiu
Inherent Interpretability Pre Trained Model Spurious Correlation Concept Bottleneck Model Model Interpretability Concept Based Model

June 30, 2024

Explaining Chest X-ray Pathology Models using Textual Concepts
Vijay Sadashivaiah, Pingkun Yan, James A. Hendler
Line by Line Explanation Counterfactual Explanation Model Interpretability Textual Representation Concept Based Explanation Human Understandable Explanation

June 13, 2024

An Unsupervised Approach to Achieve Supervised-Level Explainability in Healthcare Records
Joakim Edin, Maria Maistro, Lars Maaløe, Lasse Borgholt, Jakob D. Havtorn, Tuukka Ruotsalo
Language Model Line by Line Explanation Explainability Method Model Interpretability UNsupervised Approach Annotated Evidence

June 4, 2024

Kolmogorov-Arnold Networks for Time Series: Bridging Predictive Power and Interpretability
Kunpeng Xu, Lifei Chen, Shengrui Wang
Time Series Inherent Interpretability Symbolic Regression Kolmogorov Arnold Network Model Interpretability Activation Pattern Predictive Power Adaptive Forecasting

May 29, 2024

April 29, 2024

Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots
Xi Xin, Giles Hooker, Fei Huang
Machine Learning Adversarial Attack Black Box Model Model Interpretability Adversarial Framework Partial Dependence Plot Dependence Shift

April 27, 2024

Feature graphs for interpretable unsupervised tree ensembles: centrality, interaction, and application in disease subtyping
Christel Sirocchi, Martin Urschler, Bastian Pfeifer
Inherent Interpretability Common Disease Graph Based Interpretable Machine Learning Interaction Generation Model Interpretability Feature Graph Load Centrality Interpretable Ensemble

March 28, 2024

Towards Stable Machine Learning Model Retraining via Slowly Varying Sequences
Dimitris Bertsimas, Vassilis Digalakis, Yu Ma, Phevos Paschalidis
Machine Learning Machine Learning Model Mixed Integer Model Interpretability Optimal Model Length Sequence

March 8, 2024

Multiple Instance Learning with random sampling for Whole Slide Image Classification
H. Keshvarikhojasteh, J. P. W. Pluim, M. Veta
Inherent Interpretability Multiple Instance Learning Computational Pathology Model Interpretability Whole Slide Image Classification Random Sampling Interpretability Illusion

February 25, 2024

Unmasking Dementia Detection by Masking Input Gradients: A JSM Approach to Model Interpretability and Precision
Yasmine Mustafa, Tie Luo
Inherent Interpretability Explainable AI Multimodal Model Saliency Map Multidimensional Local Precision Rate Model Interpretability Input Gradient Post Hoc XAI Method

February 19, 2024

February 18, 2024

A Transition System Abstraction Framework for Neural Network Dynamical System Models
Yejiang Yang, Zihao Mo, Hoang-Dung Tran, Weiming Xiang
Dynamical System Model Interpretability Neural Network Dynamic Transition System

February 14, 2024

SyntaxShap: Syntax-aware Explainability Method for Text Generation
Kenza Amara, Rita Sevastjanova, Mennatallah El-Assady
Text Generation Line by Line Explanation Explainability Method Model Interpretability Code Syntax Syntactic Dependency Model Agnostic Explainability

February 7, 2024

Beyond explaining: XAI-based Adaptive Learning with SHAP Clustering for Energy Consumption Prediction
Tobias Clement, Hung Truong Thanh Nguyen, Nils Kemmerzell, Mohamed Abdelaal, Davor Stjelja
Explainable AI Adaptive Learning Model Interpretability Energy Prediction SHAP Clustering

January 30, 2024

ViTree: Single-path Neural Tree for Step-wise Interpretable Fine-grained Visual Categorization
Danning Lao, Qi Liu, Jiazi Bu, Junchi Yan, Wei Shen
Vision Transformer Inherent Interpretability Model Interpretability Fine Grained Visual Neural Tree

January 16, 2024

MICA: Towards Explainable Skin Lesion Diagnosis via Multi-Level Image-Concept Alignment
Yequan Bie, Luyang Luo, Hao Chen
Inherent Interpretability Skin Lesion Model Interpretability Interpretable Concept Skin Image

January 9, 2024

Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks
Tanmay Garg, Deepika Vemuri, Vineeth N Balasubramanian
Generative Adversarial Network Adversarial Training Model Interpretability Explainable Model Visual Attribute Task Induced Representation

Model Interpretability

Papers

Feature Inference Attack on Shapley Values

Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort

Explaining Chest X-ray Pathology Models using Textual Concepts

An Unsupervised Approach to Achieve Supervised-Level Explainability in Healthcare Records

Kolmogorov-Arnold Networks for Time Series: Bridging Predictive Power and Interpretability

Partial Information Decomposition for Data Interpretability and Feature Selection

LLM-based Hierarchical Concept Decomposition for Interpretable Fine-Grained Image Classification

Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots

Feature graphs for interpretable unsupervised tree ensembles: centrality, interaction, and application in disease subtyping

Towards Stable Machine Learning Model Retraining via Slowly Varying Sequences

Multiple Instance Learning with random sampling for Whole Slide Image Classification

Unmasking Dementia Detection by Masking Input Gradients: A JSM Approach to Model Interpretability and Precision

CausalGym: Benchmarking causal interpretability methods on linguistic tasks

Federated Bayesian Network Ensembles

A Transition System Abstraction Framework for Neural Network Dynamical System Models

SyntaxShap: Syntax-aware Explainability Method for Text Generation

Beyond explaining: XAI-based Adaptive Learning with SHAP Clustering for Energy Consumption Prediction

ViTree: Single-path Neural Tree for Step-wise Interpretable Fine-grained Visual Categorization

MICA: Towards Explainable Skin Lesion Diagnosis via Multi-Level Image-Concept Alignment

Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks