Inherent Interpretability
Inherent interpretability in machine learning focuses on designing models and methods that are inherently transparent and understandable, aiming to reduce the "black box" nature of many AI systems. Current research emphasizes developing intrinsically interpretable model architectures, such as those based on decision trees, rule-based systems, and specific neural network designs (e.g., Kolmogorov-Arnold Networks), alongside techniques like feature attribution and visualization methods to enhance understanding of model behavior. This pursuit is crucial for building trust in AI, particularly in high-stakes applications like healthcare and finance, where understanding model decisions is paramount for responsible deployment and effective human-AI collaboration.
Papers
Combining Stochastic Explainers and Subgraph Neural Networks can Increase Expressivity and Interpretability
Indro Spinelli, Michele Guerra, Filippo Maria Bianchi, Simone Scardapane
Interpretability is a Kind of Safety: An Interpreter-based Ensemble for Adversary Defense
Jingyuan Wang, Yufan Wu, Mingxuan Li, Xin Lin, Junjie Wu, Chao Li
Local Interpretability of Random Forests for Multi-Target Regression
Avraam Bardos, Nikolaos Mylonas, Ioannis Mollas, Grigorios Tsoumakas
LMDA-Net:A lightweight multi-dimensional attention network for general EEG-based brain-computer interface paradigms and interpretability
Zhengqing Miao, Xin Zhang, Meirong Zhao, Dong Ming