Post Hoc Explainability

Post-hoc explainability aims to understand the decision-making processes of already-trained "black box" machine learning models, particularly deep neural networks, without altering their structure or performance. Current research focuses on developing model-agnostic methods, such as those based on Shapley values, gradients, and distillation, to generate explanations across various data modalities (images, audio, text) and improve the faithfulness and stability of these explanations. This field is crucial for building trust in AI systems used in high-stakes applications like healthcare and finance, where understanding model decisions is paramount for responsible deployment and effective auditing.

Papers