Paper ID: 2407.08973

Integrating White and Black Box Techniques for Interpretable Machine Learning

Eric M. Vernon, Naoki Masuyama, Yusuke Nojima

In machine learning algorithm design, there exists a trade-off between the interpretability and performance of the algorithm. In general, algorithms which are simpler and easier for humans to comprehend tend to show worse performance than more complex, less transparent algorithms. For example, a random forest classifier is likely to be more accurate than a simple decision tree, but at the expense of interpretability. In this paper, we present an ensemble classifier design which classifies easier inputs using a highly-interpretable classifier (i.e., white box model), and more difficult inputs using a more powerful, but less interpretable classifier (i.e., black box model).

Submitted: Jul 12, 2024