Attribution Method
Attribution methods in explainable AI aim to decipher how machine learning models arrive at their predictions by assigning importance scores to input features. Current research focuses on improving the faithfulness and efficiency of these methods across diverse model architectures, including convolutional neural networks, transformers, and large language models, often employing techniques like gradient-based approaches, perturbation tests, and counterfactual generation. This work is crucial for enhancing the trustworthiness and interpretability of complex models, particularly in high-stakes applications where understanding model decisions is paramount, and for identifying and mitigating biases or vulnerabilities.
Papers
October 29, 2024
October 17, 2024
October 13, 2024
October 2, 2024
September 30, 2024
September 25, 2024
August 22, 2024
August 21, 2024
August 14, 2024
July 28, 2024
July 26, 2024
July 16, 2024
July 12, 2024
July 11, 2024
June 18, 2024
June 17, 2024
May 26, 2024
May 24, 2024
May 2, 2024