Concept Explanation
Concept explanation research aims to make the predictions of complex machine learning models, particularly deep neural networks, more understandable and interpretable by humans. Current efforts focus on developing model-agnostic methods that identify and utilize high-level concepts (e.g., color, texture, object presence) to explain model decisions, often employing techniques like concept activation vectors and Bayesian estimation to improve robustness and accuracy. This work is crucial for building trust in AI systems, facilitating model debugging and improvement, and enabling effective human-computer collaboration in diverse applications such as plant disease classification and text analysis.
Papers
Concept-Guided Chain-of-Thought Prompting for Pairwise Comparison Scaling of Texts with Large Language Models
Patrick Y. Wu, Jonathan Nagler, Joshua A. Tucker, Solomon Messing
From Neural Activations to Concepts: A Survey on Explaining Concepts in Neural Networks
Jae Hee Lee, Sergio Lanza, Stefan Wermter