Inherent Interpretability
Inherent interpretability in machine learning focuses on designing models and methods that are inherently transparent and understandable, aiming to reduce the "black box" nature of many AI systems. Current research emphasizes developing intrinsically interpretable model architectures, such as those based on decision trees, rule-based systems, and specific neural network designs (e.g., Kolmogorov-Arnold Networks), alongside techniques like feature attribution and visualization methods to enhance understanding of model behavior. This pursuit is crucial for building trust in AI, particularly in high-stakes applications like healthcare and finance, where understanding model decisions is paramount for responsible deployment and effective human-AI collaboration.
Papers
Evaluation and Improvement of Interpretability for Self-Explainable Part-Prototype Networks
Qihan Huang, Mengqi Xue, Wenqi Huang, Haofei Zhang, Jie Song, Yongcheng Jing, Mingli Song
Utilizing Mutations to Evaluate Interpretability of Neural Networks on Genomic Data
Utku Ozbulak, Solha Kang, Jasper Zuallaert, Stephen Depuydt, Joris Vankerschaver
Expressive architectures enhance interpretability of dynamics-based neural population models
Andrew R. Sedler, Christopher Versteeg, Chethan Pandarinath
Truthful Meta-Explanations for Local Interpretability of Machine Learning Models
Ioannis Mollas, Nick Bassiliades, Grigorios Tsoumakas
A Flexible Nadaraya-Watson Head Can Offer Explainable and Calibrated Classification
Alan Q. Wang, Mert R. Sabuncu
Learning to Select Prototypical Parts for Interpretable Sequential Data Modeling
Yifei Zhang, Neng Gao, Cunqing Ma
Interpretability with full complexity by constraining feature information
Kieran A. Murphy, Dani S. Bassett
Interpretability and accessibility of machine learning in selected food processing, agriculture and health applications
N. Ranasinghe, A. Ramanan, S. Fernando, P. N. Hameed, D. Herath, T. Malepathirana, P. Suganthan, M. Niranjan, S. Halgamuge