Inherent Interpretability
Inherent interpretability in machine learning focuses on designing models and methods that are inherently transparent and understandable, aiming to reduce the "black box" nature of many AI systems. Current research emphasizes developing intrinsically interpretable model architectures, such as those based on decision trees, rule-based systems, and specific neural network designs (e.g., Kolmogorov-Arnold Networks), alongside techniques like feature attribution and visualization methods to enhance understanding of model behavior. This pursuit is crucial for building trust in AI, particularly in high-stakes applications like healthcare and finance, where understanding model decisions is paramount for responsible deployment and effective human-AI collaboration.
Papers
Calibration through the Lens of Interpretability
Alireza Torabian, Ruth Urner
A Comprehensive Guide to Explainable AI: From Classical Models to LLMs
Weiche Hsieh, Ziqian Bi, Chuanqi Jiang, Junyu Liu, Benji Peng, Sen Zhang, Xuanhe Pan, Jiawei Xu, Jinlang Wang, Keyu Chen, Caitlyn Heqi Yin, Pohsun Feng, Yizhu Wen, Xinyuan Song, Tianyang Wang, Junjie Yang, Ming Li, Bowen Jing, Jintao Ren, Junhao Song, Han Xu, Hong-Ming Tseng, Yichao Zhang, Lawrence K.Q. Yan, Qian Niu, Silin Chen, Yunze Wang, Chia Xin Liang, Ming Liu
Functional relevance based on the continuous Shapley value
Pedro Delicado, Cristian Pachón-García
FreqX: What neural networks learn is what network designers say
Zechen Liu
Large Scale Evaluation of Deep Learning-based Explainable Solar Flare Forecasting Models with Attribution-based Proximity Analysis
Temitope Adeyeha, Chetraj Pandey, Berkay Aydin
New Faithfulness-Centric Interpretability Paradigms for Natural Language Processing
Andreas Madsen
Bi-ICE: An Inner Interpretable Framework for Image Classification via Bi-directional Interactions between Concept and Input Embeddings
Jinyung Hong, Yearim Kim, Keun Hee Park, Sangyu Han, Nojun Kwak, Theodore P. Pavlic
Learning Explainable Treatment Policies with Clinician-Informed Representations: A Practical Approach
Johannes O. Ferstad, Emily B. Fox, David Scheinker, Ramesh Johari
Network Inversion and Its Applications
Pirzada Suhail, Hao Tang, Amit Sethi
Disentangled Interpretable Representation for Efficient Long-term Time Series Forecasting
Yuang Zhao, Tianyu Li, Jiadong Chen, Shenrong Ye, Fuxin Jiang, Tieying Zhang, Xiaofeng Gao
NormXLogit: The Head-on-Top Never Lies
Sina Abbasi, Mohammad Reza Modarres, Mohammad Taher Pilehvar
Learning Predictive Checklists with Probabilistic Logic Programming
Yukti Makhija, Edward De Brouwer, Rahul G. Krishnan
Adaptive Circuit Behavior and Generalization in Mechanistic Interpretability
Jatin Nainani, Sankaran Vaidyanathan, AJ Yeung, Kartik Gupta, David Jensen
Free Energy Projective Simulation (FEPS): Active inference with interpretability
Joséphine Pazem, Marius Krumm, Alexander Q. Vining, Lukas J. Fiderer, Hans J. Briegel
Exploring Kolmogorov-Arnold Networks for Interpretable Time Series Classification
Irina Barašin, Blaž Bertalanič, Miha Mohorčič, Carolina Fortuna