Inherent Interpretability
Inherent interpretability in machine learning focuses on designing models and methods that are inherently transparent and understandable, aiming to reduce the "black box" nature of many AI systems. Current research emphasizes developing intrinsically interpretable model architectures, such as those based on decision trees, rule-based systems, and specific neural network designs (e.g., Kolmogorov-Arnold Networks), alongside techniques like feature attribution and visualization methods to enhance understanding of model behavior. This pursuit is crucial for building trust in AI, particularly in high-stakes applications like healthcare and finance, where understanding model decisions is paramount for responsible deployment and effective human-AI collaboration.
Papers
Decoding Interpretable Logic Rules from Neural Networks
Chuqin Geng, Xiaojie Xu, Zhaoyue Wang, Ziyu Zhao, Xujie Si
Revolutionizing Communication with Deep Learning and XAI for Enhanced Arabic Sign Language Recognition
Mazen Balat, Rewaa Awaad, Ahmed B. Zaky, Salah A. Aly
Refusal Behavior in Large Language Models: A Nonlinear Perspective
Fabian Hildebrandt, Andreas Maier, Patrick Krauss, Achim Schilling
RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment
Difei Gu, Yunhe Gao, Yang Zhou, Mu Zhou, Dimitris Metaxas
Neural Probabilistic Circuits: Enabling Compositional and Interpretable Predictions through Logical Reasoning
Weixin Chen, Simon Yu, Huajie Shao, Lui Sha, Han Zhao
Tensorization of neural networks for improved privacy and interpretability
José Ramón Pareja Monturiol, Alejandro Pozas-Kerstjens, David Pérez-García
Explaining k-Nearest Neighbors: Abductive and Counterfactual Explanations
Pablo Barceló, Alexander Kozachinskiy, Miguel Romero Orth, Bernardo Subercaseaux, José Verschae
COMIX: Compositional Explanations using Prototypes
Sarath Sivaprasad, Dmitry Kangin, Plamen Angelov, Mario Fritz