Model Interpretability
Model interpretability aims to make the decision-making processes of complex machine learning models transparent and understandable. Current research focuses on developing both inherently interpretable models, such as generalized additive models and rule-based systems, and post-hoc methods that explain the predictions of black-box models, often using techniques like SHAP values, Grad-CAM, and various attention mechanisms applied to architectures like transformers and neural networks. This field is crucial for building trust in AI systems, particularly in high-stakes domains like healthcare and finance, and for facilitating the responsible development and deployment of machine learning technologies.
Papers
Concept-Centric Transformers: Enhancing Model Interpretability through Object-Centric Concept Learning within a Shared Global Workspace
Jinyung Hong, Keun Hee Park, Theodore P. Pavlic
On the Impact of Knowledge Distillation for Model Interpretability
Hyeongrok Han, Siwon Kim, Hyun-Soo Choi, Sungroh Yoon
DeforestVis: Behavior Analysis of Machine Learning Models with Surrogate Decision Stumps
Angelos Chatzimparmpas, Rafael M. Martins, Alexandru C. Telea, Andreas Kerren
Trade-offs in Fine-tuned Diffusion Models Between Accuracy and Interpretability
Mischa Dombrowski, Hadrien Reynaud, Johanna P. Müller, Matthew Baugh, Bernhard Kainz