Feature Attribution

Feature attribution aims to explain the predictions of complex machine learning models by identifying which input features most significantly influence the output. Current research focuses on developing and evaluating various attribution methods, including gradient-based approaches like Integrated Gradients and game-theoretic methods like SHAP, often applied to deep neural networks (including transformers) and other architectures like Siamese encoders. These efforts address challenges such as faithfulness (accuracy of attributions), robustness (consistency under perturbations), and computational efficiency, ultimately seeking to improve model transparency and trustworthiness for applications ranging from medical diagnosis to scientific discovery.

Papers