Attribution Method
Attribution methods in explainable AI aim to decipher how machine learning models arrive at their predictions by assigning importance scores to input features. Current research focuses on improving the faithfulness and efficiency of these methods across diverse model architectures, including convolutional neural networks, transformers, and large language models, often employing techniques like gradient-based approaches, perturbation tests, and counterfactual generation. This work is crucial for enhancing the trustworthiness and interpretability of complex models, particularly in high-stakes applications where understanding model decisions is paramount, and for identifying and mitigating biases or vulnerabilities.
Papers
July 18, 2023
July 12, 2023
July 6, 2023
July 4, 2023
June 27, 2023
June 23, 2023
June 22, 2023
May 24, 2023
May 16, 2023
April 26, 2023
March 24, 2023
March 21, 2023
March 15, 2023
March 2, 2023
March 1, 2023
February 13, 2023
January 5, 2023
December 12, 2022
December 6, 2022