Post Hoc Explainability

Post-hoc explainability aims to understand the decision-making processes of already-trained "black box" machine learning models, particularly deep neural networks, without altering their structure or performance. Current research focuses on developing model-agnostic methods, such as those based on Shapley values, gradients, and distillation, to generate explanations across various data modalities (images, audio, text) and improve the faithfulness and stability of these explanations. This field is crucial for building trust in AI systems used in high-stakes applications like healthcare and finance, where understanding model decisions is paramount for responsible deployment and effective auditing.

Post Hoc Explainability

Papers

Unfooling Perturbation-Based Post Hoc Explainers

Explanatory Paradigms in Neural Networks

Reinforcement Learning Your Way: Agent Characterization through Policy Regularization