Attribution Method
Attribution methods in explainable AI aim to decipher how machine learning models arrive at their predictions by assigning importance scores to input features. Current research focuses on improving the faithfulness and efficiency of these methods across diverse model architectures, including convolutional neural networks, transformers, and large language models, often employing techniques like gradient-based approaches, perturbation tests, and counterfactual generation. This work is crucial for enhancing the trustworthiness and interpretability of complex models, particularly in high-stakes applications where understanding model decisions is paramount, and for identifying and mitigating biases or vulnerabilities.
Papers
A Consistent and Efficient Evaluation Strategy for Attribution Methods
Yao Rong, Tobias Leemann, Vadim Borisov, Gjergji Kasneci, Enkelejda Kasneci
Visualizing Automatic Speech Recognition -- Means for a Better Understanding?
Karla Markert, Romain Parracone, Mykhailo Kulakov, Philip Sperl, Ching-Yu Kao, Konstantin Böttinger