Attribution Method
Attribution methods in explainable AI aim to decipher how machine learning models arrive at their predictions by assigning importance scores to input features. Current research focuses on improving the faithfulness and efficiency of these methods across diverse model architectures, including convolutional neural networks, transformers, and large language models, often employing techniques like gradient-based approaches, perturbation tests, and counterfactual generation. This work is crucial for enhancing the trustworthiness and interpretability of complex models, particularly in high-stakes applications where understanding model decisions is paramount, and for identifying and mitigating biases or vulnerabilities.
Papers
May 2, 2024
April 29, 2024
April 22, 2024
March 27, 2024
March 21, 2024
March 11, 2024
February 14, 2024
February 13, 2024
February 5, 2024
February 1, 2024
January 28, 2024
January 1, 2024
December 29, 2023
December 21, 2023
December 16, 2023
November 7, 2023
October 14, 2023
October 10, 2023
October 9, 2023