Attribution Method
Attribution methods in explainable AI aim to decipher how machine learning models arrive at their predictions by assigning importance scores to input features. Current research focuses on improving the faithfulness and efficiency of these methods across diverse model architectures, including convolutional neural networks, transformers, and large language models, often employing techniques like gradient-based approaches, perturbation tests, and counterfactual generation. This work is crucial for enhancing the trustworthiness and interpretability of complex models, particularly in high-stakes applications where understanding model decisions is paramount, and for identifying and mitigating biases or vulnerabilities.
Papers
December 6, 2022
November 29, 2022
November 17, 2022
November 3, 2022
September 6, 2022
August 1, 2022
July 15, 2022
June 14, 2022
June 7, 2022
May 30, 2022
May 23, 2022
May 20, 2022
May 19, 2022
April 13, 2022
April 11, 2022
March 23, 2022
February 24, 2022
February 23, 2022