Post Hoc Attribution

Post-hoc attribution methods aim to explain the decisions of complex "black box" machine learning models, particularly deep neural networks, by identifying the input features most influential on a given prediction. Current research focuses on improving the accuracy and reliability of these methods, particularly for long documents and complex tasks like question answering and image segmentation, often employing techniques like answer decomposition, surrogate modeling, and concept-based explanations. This work is crucial for building trustworthy AI systems, enhancing model interpretability, and facilitating the identification and mitigation of biases within machine learning models.

Papers