Counterfactual Simulatability

Counterfactual simulatability assesses the ability of a model's explanation to predict its behavior under altered inputs. Current research focuses on evaluating this capability in large language models and graph neural networks, using both human-based and artificial metrics to quantify explanation quality and usefulness. This research aims to improve the trustworthiness and understandability of complex AI systems by determining whether explanations accurately reflect the model's internal workings and decision-making processes. Ultimately, advancements in counterfactual simulatability will contribute to more reliable and explainable AI across various applications.

Papers