Inherent Interpretability
Inherent interpretability in machine learning focuses on designing models and methods that are inherently transparent and understandable, aiming to reduce the "black box" nature of many AI systems. Current research emphasizes developing intrinsically interpretable model architectures, such as those based on decision trees, rule-based systems, and specific neural network designs (e.g., Kolmogorov-Arnold Networks), alongside techniques like feature attribution and visualization methods to enhance understanding of model behavior. This pursuit is crucial for building trust in AI, particularly in high-stakes applications like healthcare and finance, where understanding model decisions is paramount for responsible deployment and effective human-AI collaboration.
Papers
IFAN: An Explainability-Focused Interaction Framework for Humans and NLP Models
Edoardo Mosca, Daryna Dementieva, Tohid Ebrahim Ajdari, Maximilian Kummeth, Kirill Gringauz, Yutong Zhou, Georg Groh
NxPlain: Web-based Tool for Discovery of Latent Concepts
Fahim Dalvi, Nadir Durrani, Hassan Sajjad, Tamim Jaban, Musab Husaini, Ummar Abbas
ToxVis: Enabling Interpretability of Implicit vs. Explicit Toxicity Detection Models with Interactive Visualization
Uma Gunturi, Xiaohan Ding, Eugenia H. Rho
Inherently Interpretable Multi-Label Classification Using Class-Specific Counterfactuals
Susu Sun, Stefano Woerner, Andreas Maier, Lisa M. Koch, Christian F. Baumgartner