Paper ID: 2310.16316

Sum-of-Parts: Faithful Attributions for Groups of Features

Weiqiu You, Helen Qu, Marco Gatti, Bhuvnesh Jain, Eric Wong

Feature attributions explain machine learning predictions by assigning importance scores to input features. While faithful attributions accurately reflect feature contributions to the model's prediction, unfaithful ones can lead to misleading interpretations, making them unreliable in high-stake domains. The challenge of unfaithfulness of post-hoc attributions led to the development of self-explaining models. However, self-explaining models often trade-off performance for interpretability. In this work, we develop Sum-of-Parts (SOP), a new framework that transforms any differentiable model into a self-explaining model whose predictions can be attributed to groups of features. The SOP framework leverages pretrained deep learning models with custom attention modules to learn useful feature groups end-to-end without direct supervision. With these capabilities, SOP achieves highest performance while also scoring high with respect to faithfulness metrics on both ImageNet and CosmoGrid. We validate the usefulness of the groups learned by SOP through their high purity, strong human distinction ability, and practical utility in scientific discovery. In a case study, we show how SOP assists cosmologists in uncovering new insights about galaxy formation.

Submitted: Oct 25, 2023