Model Agnostic Feature Attribution

Model-agnostic feature attribution aims to understand which input features most influence a machine learning model's predictions, regardless of the model's internal workings. Current research focuses on extending these methods to diverse model architectures, including generative language models and ranking systems, often employing techniques like SHAP values, LIME, and novel recursive approaches. This work is crucial for improving model transparency, trustworthiness, and debugging, ultimately leading to more reliable and explainable AI systems across various applications.

Papers