Interpretability Evaluation

Interpretability evaluation focuses on developing and applying methods to assess the understandability and trustworthiness of machine learning models, particularly deep neural networks. Current research emphasizes developing more robust and reliable metrics, including those that consider model modularity, conceptual similarity, and the reduction of explanation noise in techniques like Integrated Gradients. This work is crucial for building trust in AI systems and ensuring responsible deployment across various applications, from copyright infringement detection to active learning strategies, by providing rigorous methods for evaluating the quality and accuracy of explanations generated by these models.

Papers