Model Interpretability

Model interpretability aims to make the decision-making processes of complex machine learning models transparent and understandable. Current research focuses on developing both inherently interpretable models, such as generalized additive models and rule-based systems, and post-hoc methods that explain the predictions of black-box models, often using techniques like SHAP values, Grad-CAM, and various attention mechanisms applied to architectures like transformers and neural networks. This field is crucial for building trust in AI systems, particularly in high-stakes domains like healthcare and finance, and for facilitating the responsible development and deployment of machine learning technologies.

Papers

November 17, 2023

Flexible Model Interpretability through Natural Language Model Editing
Karel D'Oosterlinck, Thomas Demeester, Chris Develder, Christopher Potts
Language Model Interpretable Model Internal Representation Model Interpretability Model Behavior Edit Operation

November 10, 2023

Greedy PIG: Adaptive Integrated Gradients
Kyriakos Axiotis, Sami Abu-al-haija, Lin Chen, Matthew Fahrbach, Gang Fu
Feature Attribution Model Interpretability Feature Attribution Method Area Under the ROC Curve Integrated Gradient Gradient Path

October 22, 2023

Learning Interpretable Rules for Scalable Data Representation and Classification
Zhuo Wang, Wei Zhang, Ning Liu, Jianyong Wang
Classification Code Decision Tree Model Interpretability Rule Learning Interpretable Rule Rule Based Model Scalable Visual Representation

October 7, 2023

Uncovering hidden geometry in Transformers via disentangling position and context
Jiajun Song, Yiqiao Zhong
Transformer Megatron Decepticons Jina Embeddings Transformer Architecture Context Information Model Interpretability Product Specific Position Information Interpretable Feature Information Decomposition

September 14, 2023

Interpretability-Aware Vision Transformer
Yao Qiang, Chengyin Li, Prashant Khanduri, Dongxiao Zhu
Vision Transformer Inherent Interpretability Model Interpretability Self Interpretable Model

September 10, 2023

Transparency in Sleep Staging: Deep Learning Method for EEG Sleep Stage Classification with Model Interpretability
Shivam Sharma, Suvadeep Maiti, S. Mythirayee, Srijithesh Rajendran, Raju Surampudi Bapi
Deep Learning Transparency Index Model Interpretability Bi LSTM Sleep Staging Sleep Stage Single Channel

August 7, 2023

Applied metamodelling for ATM performance simulations
Christoffer Riis, Francisco N. Antunes, Tatjana Bolić, Gérald Gurtner, Andrew Cook, Carlos Lima Azevedo, Francisco Câmara Pereira
Model Interpretability Simulation Metamodeling Active Learning With Contrastive Explanation

July 17, 2023

Analyzing the Impact of Adversarial Examples on Explainable Machine Learning
Prathyusha Devabhakthini, Sasmita Parida, Raj Mani Shukla, Suvendu Chandan Nayak
Adversarial Attack Deep Learning Model Adversarial Example Global Impact Adversarial Perturbation Model Interpretability Explainable Machine Learning Text Classification Problem

July 14, 2023

Looking deeper into interpretable deep learning in neuroimaging: a comprehensive survey
Md. Mahfuzur Rahman, Vince D. Calhoun, Sergey M. Plis
Deep Learning Model Comprehensive Survey Model Interpretability Interpretable Deep Learning Built in Interpretability Interpretability Tool

July 11, 2023

Learning Active Subspaces and Discovering Important Features with Gaussian Radial Basis Functions Neural Networks
Danny D'Agostino, Ilija Ilievski, Christine Annette Shoemaker
Model Interpretability Predictive Performance Radial Basis Function Gaussian Kernel Active Subspace Precision Matrix Important Feature

July 10, 2023

Hierarchical Semantic Tree Concept Whitening for Interpretable Image Classification
Haixing Dai, Lu Zhang, Lin Zhao, Zihao Wu, Zhengliang Liu, David Liu, Xiaowei Yu, Yanjun Lyu, Changying Li, Ninghao Liu, Tianming Liu, Dajiang Zhu
Deep Model Model Interpretability Hidden Layer Concept Based Explanation Interpretable Image

June 24, 2023

IERL: Interpretable Ensemble Representation Learning -- Combining CrowdSourced Knowledge and Distributed Semantic Representations
Yuxin Zi, Kaushik Roy, Vignesh Narayanan, Manas Gaur, Amit Sheth
Semantic Representation Knowledge Representation Model Interpretability Interpretable Ensemble

May 27, 2023

Diagnosing Transformers: Illuminating Feature Spaces for Clinical Decision-Making
Aliyah R. Hsu, Yeshwanth Cherapanamjeri, Briton Park, Tristan Naumann, Anobel Y. Odisho, Bin Yu
Transformer Megatron Decepticons Pre Trained Transformer Feature Space Model Interpretability Domain Specific Model Clinical Decision Domain Robustness

May 25, 2023

March 31, 2023

March 26, 2023

Analyzing Effects of Mixed Sample Data Augmentation on Model Interpretability
Soyoun Won, Sung-Ho Bae, Seong Tae Kim
Data Augmentation Model Interpretability Data Augmentation Strategy Mixed Sample Data Augmentation

March 24, 2023

Improving Prediction Performance and Model Interpretability through Attention Mechanisms from Basic and Applied Research Perspectives
Shunsuke Kitada
Machine Learning Deep Learning Model Machine Learning Model Attention Mechanism Model Interpretability Deep Learning Technology Improving Prediction

February 21, 2023

On Inductive Biases for Machine Learning in Data Constrained Settings
Grégoire Mialon
Machine Learning Machine Learning Model Transfer Learning Inductive Bias Model Interpretability Convex Machine Learning

Model Interpretability

Papers

Flexible Model Interpretability through Natural Language Model Editing

Greedy PIG: Adaptive Integrated Gradients

Learning Interpretable Rules for Scalable Data Representation and Classification

Uncovering hidden geometry in Transformers via disentangling position and context

Interpretability-Aware Vision Transformer

Transparency in Sleep Staging: Deep Learning Method for EEG Sleep Stage Classification with Model Interpretability

Applied metamodelling for ATM performance simulations

Analyzing the Impact of Adversarial Examples on Explainable Machine Learning

Looking deeper into interpretable deep learning in neuroimaging: a comprehensive survey

Learning Active Subspaces and Discovering Important Features with Gaussian Radial Basis Functions Neural Networks

Hierarchical Semantic Tree Concept Whitening for Interpretable Image Classification

IERL: Interpretable Ensemble Representation Learning -- Combining CrowdSourced Knowledge and Distributed Semantic Representations

Diagnosing Transformers: Illuminating Feature Spaces for Clinical Decision-Making

Concept-Centric Transformers: Enhancing Model Interpretability through Object-Centric Concept Learning within a Shared Global Workspace

On the Impact of Knowledge Distillation for Model Interpretability

DeforestVis: Behavior Analysis of Machine Learning Models with Surrogate Decision Stumps

Trade-offs in Fine-tuned Diffusion Models Between Accuracy and Interpretability

Analyzing Effects of Mixed Sample Data Augmentation on Model Interpretability

Improving Prediction Performance and Model Interpretability through Attention Mechanisms from Basic and Applied Research Perspectives

On Inductive Biases for Machine Learning in Data Constrained Settings