Attribution Score

Attribution scores quantify the influence of individual input features on a model's prediction, aiming to enhance model interpretability and trustworthiness. Current research focuses on improving the accuracy and efficiency of attribution methods across diverse model architectures, including deep neural networks, message-passing neural networks, and large language models, often employing techniques like Shapley values, integrated gradients, and influence functions. These advancements are crucial for building more reliable and understandable AI systems, with applications ranging from improving model fairness and debugging to facilitating better decision-making in various fields like finance and healthcare.

Papers