Attribution Map
Attribution maps are visual tools used to explain the predictions of machine learning models, primarily by assigning importance scores to input features. Current research focuses on improving the accuracy and faithfulness of these maps, exploring various methods like gradient-based approaches, perturbation techniques, and inherently interpretable model architectures such as ProtoPNet and Attri-Net, and developing robust evaluation metrics to assess their quality across different model types and datasets. The development of reliable attribution maps is crucial for building trust in AI systems, particularly in high-stakes applications like medicine and autonomous systems, by providing insights into model decision-making processes.
Papers
From Attribution Maps to Human-Understandable Explanations through Concept Relevance Propagation
Reduan Achtibat, Maximilian Dreyer, Ilona Eisenbraun, Sebastian Bosse, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin
Fooling Explanations in Text Classifiers
Adam Ivankay, Ivan Girardi, Chiara Marchiori, Pascal Frossard