Inherent Interpretability
Inherent interpretability in machine learning focuses on designing models and methods that are inherently transparent and understandable, aiming to reduce the "black box" nature of many AI systems. Current research emphasizes developing intrinsically interpretable model architectures, such as those based on decision trees, rule-based systems, and specific neural network designs (e.g., Kolmogorov-Arnold Networks), alongside techniques like feature attribution and visualization methods to enhance understanding of model behavior. This pursuit is crucial for building trust in AI, particularly in high-stakes applications like healthcare and finance, where understanding model decisions is paramount for responsible deployment and effective human-AI collaboration.
Papers
Kolmogorov-Arnold Networks for Time Series: Bridging Predictive Power and Interpretability
Kunpeng Xu, Lifei Chen, Shengrui Wang
How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?
Tianchi Liu, Lin Zhang, Rohan Kumar Das, Yi Ma, Ruijie Tao, Haizhou Li
Representations as Language: An Information-Theoretic Framework for Interpretability
Henry Conklin, Kenny Smith
I've got the "Answer"! Interpretation of LLMs Hidden States in Question Answering
Valeriya Goloviznina, Evgeny Kotelnikov
Leveraging Knowlegde Graphs for Interpretable Feature Generation
Mohamed Bouadi, Arta Alavi, Salima Benbernou, Mourad Ouziri
CONFINE: Conformal Prediction for Interpretable Neural Networks
Linhui Huang, Sayeri Lala, Niraj K. Jha
InterpreTabNet: Distilling Predictive Signals from Tabular Data by Salient Feature Interpretation
Jacob Si, Wendy Yusi Cheng, Michael Cooper, Rahul G. Krishnan
Exploring Commonalities in Explanation Frameworks: A Multi-Domain Survey Analysis
Eduard Barbu, Marharytha Domnich, Raul Vicente, Nikos Sakkas, André Morim
Interpretability of Statistical, Machine Learning, and Deep Learning Models for Landslide Susceptibility Mapping in Three Gorges Reservoir Area
Cheng Chen, Lei Fan