Counterfactual Explanation
Counterfactual explanations (CFEs) aim to enhance the interpretability of machine learning models by showing how minimal input changes would alter predictions. Current research focuses on developing robust and efficient CFE generation methods across various model types, including deep learning architectures like variational autoencoders and diffusion models, and for diverse data modalities such as images, time series, and text. This work is significant because CFEs improve model transparency and trustworthiness, fostering greater user understanding and facilitating the responsible deployment of AI in high-stakes applications like healthcare and finance.
Papers
Towards Conceptualization of "Fair Explanation": Disparate Impacts of anti-Asian Hate Speech Explanations on Content Moderators
Tin Nguyen, Jiannan Xu, Aayushi Roy, Hal Daumé, Marine Carpuat
Counterfactual Explanation Generation with s(CASP)
Sopam Dasgupta, Farhad Shakerin, Joaquín Arias, Elmer Salazar, Gopal Gupta