Convincing Explanation

Convincing explanation in artificial intelligence focuses on generating human-understandable justifications for model outputs, aiming to improve trust and transparency in AI systems. Current research emphasizes the use of large language models (LLMs), often employing techniques like prompting and self-refinement to enhance the persuasiveness and faithfulness of generated explanations, while also investigating methods to detect and mitigate the risk of adversarially helpful, misleading explanations. This work is crucial for building reliable and trustworthy AI systems across various applications, from decision support systems to misinformation detection, by addressing the critical need for verifiable and understandable AI reasoning.

Papers