Inherent Interpretability
Inherent interpretability in machine learning focuses on designing models and methods that are inherently transparent and understandable, aiming to reduce the "black box" nature of many AI systems. Current research emphasizes developing intrinsically interpretable model architectures, such as those based on decision trees, rule-based systems, and specific neural network designs (e.g., Kolmogorov-Arnold Networks), alongside techniques like feature attribution and visualization methods to enhance understanding of model behavior. This pursuit is crucial for building trust in AI, particularly in high-stakes applications like healthcare and finance, where understanding model decisions is paramount for responsible deployment and effective human-AI collaboration.
Papers
Semantic Prototypes: Enhancing Transparency Without Black Boxes
Orfeas Menis-Mastromichalakis, Giorgos Filandrianos, Jason Liartis, Edmund Dervakos, Giorgos Stamou
Why do you cite? An investigation on citation intents and decision-making classification processes
Lorenzo Paolini, Sahar Vahdati, Angelo Di Iorio, Robert Wardenga, Ivan Heibi, Silvio Peroni
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis
Junying Chen, Chi Gui, Anningzhe Gao, Ke Ji, Xidong Wang, Xiang Wan, Benyou Wang
Are Linear Regression Models White Box and Interpretable?
Ahmed M Salih, Yuhe Wang
Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent
Karolis Jucys, George Adamopoulos, Mehrab Hamidi, Stephanie Milani, Mohammad Reza Samsami, Artem Zholus, Sonia Joseph, Blake Richards, Irina Rish, Özgür Şimşek
Local Feature Selection without Label or Feature Leakage for Interpretable Machine Learning Predictions
Harrie Oosterhuis, Lijun Lyu, Avishek Anand
Generally-Occurring Model Change for Robust Counterfactual Explanations
Ao Xu, Tieru Wu
Towards More Trustworthy and Interpretable LLMs for Code through Syntax-Grounded Explanations
David N. Palacio, Daniel Rodriguez-Cardenas, Alejandro Velasco, Dipin Khati, Kevin Moran, Denys Poshyvanyk
Integrating White and Black Box Techniques for Interpretable Machine Learning
Eric M. Vernon, Naoki Masuyama, Yusuke Nojima
Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort
Jeeyung Kim, Ze Wang, Qiang Qiu