Inherent Interpretability
Inherent interpretability in machine learning focuses on designing models and methods that are inherently transparent and understandable, aiming to reduce the "black box" nature of many AI systems. Current research emphasizes developing intrinsically interpretable model architectures, such as those based on decision trees, rule-based systems, and specific neural network designs (e.g., Kolmogorov-Arnold Networks), alongside techniques like feature attribution and visualization methods to enhance understanding of model behavior. This pursuit is crucial for building trust in AI, particularly in high-stakes applications like healthcare and finance, where understanding model decisions is paramount for responsible deployment and effective human-AI collaboration.
Papers
Cross Feature Selection to Eliminate Spurious Interactions and Single Feature Dominance Explainable Boosting Machines
Shree Charran R, Sandipan Das Mahapatra
Multi-Objective Optimization of Performance and Interpretability of Tabular Supervised Machine Learning Models
Lennart Schneider, Bernd Bischl, Janek Thomas
It's All Relative: Interpretable Models for Scoring Bias in Documents
Aswin Suresh, Chi-Hsuan Wu, Matthias Grossglauser
SHAMSUL: Systematic Holistic Analysis to investigate Medical Significance Utilizing Local interpretability methods in deep learning for chest radiography pathology prediction
Mahbub Ul Alam, Jaakko Hollmén, Jón Rúnar Baldvinsson, Rahim Rahmani