Inherent Interpretability
Inherent interpretability in machine learning focuses on designing models and methods that are inherently transparent and understandable, aiming to reduce the "black box" nature of many AI systems. Current research emphasizes developing intrinsically interpretable model architectures, such as those based on decision trees, rule-based systems, and specific neural network designs (e.g., Kolmogorov-Arnold Networks), alongside techniques like feature attribution and visualization methods to enhance understanding of model behavior. This pursuit is crucial for building trust in AI, particularly in high-stakes applications like healthcare and finance, where understanding model decisions is paramount for responsible deployment and effective human-AI collaboration.
Papers
Analyzing a Caching Model
Leon Sixt, Evan Zheran Liu, Marie Pellat, James Wexler, Milad Hashemi, Been Kim, Martin Maas
Why Are You Weird? Infusing Interpretability in Isolation Forest for Anomaly Detection
Nirmal Sobha Kartha, Clément Gautrais, Vincent Vercruyssen
On the Compression of Natural Language Models
Saeed Damadi
A Novel Tropical Geometry-based Interpretable Machine Learning Method: Application in Prognosis of Advanced Heart Failure
Heming Yao, Harm Derksen, Jessica R. Golbus, Justin Zhang, Keith D. Aaronson, Jonathan Gryak, Kayvan Najarian
Latent Space Explanation by Intervention
Itai Gat, Guy Lorberbom, Idan Schwartz, Tamir Hazan