Counterfactual Simulatability
Counterfactual simulatability assesses the ability of a model's explanation to predict its behavior under altered inputs. Current research focuses on evaluating this capability in large language models and graph neural networks, using both human-based and artificial metrics to quantify explanation quality and usefulness. This research aims to improve the trustworthiness and understandability of complex AI systems by determining whether explanations accurately reflect the model's internal workings and decision-making processes. Ultimately, advancements in counterfactual simulatability will contribute to more reliable and explainable AI across various applications.
Papers
July 17, 2023
May 25, 2023
February 1, 2023