Self Explanation

Self-explanation in artificial intelligence focuses on enabling AI models, particularly large language models (LLMs), to generate explanations for their decisions and reasoning processes. Current research emphasizes evaluating the faithfulness and plausibility of these self-explanations, often comparing them to human-generated rationales or using techniques like perturbation analysis and counterfactual generation to assess their accuracy. This area is crucial for building trust in AI systems, improving their transparency and accountability, and facilitating their safe deployment in high-stakes applications, particularly in fields like healthcare and autonomous systems. Furthermore, research explores using self-explanations to improve model performance through techniques like in-context learning and self-error detection.

Papers