Feature Attribution
Feature attribution aims to explain the predictions of complex machine learning models by identifying which input features most significantly influence the output. Current research focuses on developing and evaluating various attribution methods, including gradient-based approaches like Integrated Gradients and game-theoretic methods like SHAP, often applied to deep neural networks (including transformers) and other architectures like Siamese encoders. These efforts address challenges such as faithfulness (accuracy of attributions), robustness (consistency under perturbations), and computational efficiency, ultimately seeking to improve model transparency and trustworthiness for applications ranging from medical diagnosis to scientific discovery.
Papers
Algorithms to estimate Shapley value feature attributions
Hugh Chen, Ian C. Covert, Scott M. Lundberg, Su-In Lee
CheXplaining in Style: Counterfactual Explanations for Chest X-rays using StyleGAN
Matan Atad, Vitalii Dmytrenko, Yitong Li, Xinyue Zhang, Matthias Keicher, Jan Kirschke, Bene Wiestler, Ashkan Khakzar, Nassir Navab