Explanation Faithfulness

Explanation faithfulness in machine learning focuses on ensuring that explanations of model predictions accurately reflect the model's actual reasoning process. Current research investigates methods to improve faithfulness across various model architectures, including convolutional neural networks, large language models, and transformers, employing techniques like hierarchical segmentation, counterfactual interventions, and logic rule integration. This pursuit is crucial for building trust in AI systems, particularly in high-stakes domains like medicine and law, by enabling better understanding and debugging of model behavior and promoting more reliable decision-making. Improved faithfulness metrics are also a key area of ongoing development.

Papers