Interpretable Concept

Interpretable concept research aims to make the decision-making processes of complex machine learning models, particularly deep learning models, more transparent and understandable. Current efforts focus on developing model architectures, such as concept bottleneck models and self-explaining neural networks, that incorporate human-interpretable concepts into their design or post-hoc explanation methods that leverage vision-language models or unsupervised concept discovery. This work is crucial for building trust in AI systems across various applications, from medical diagnosis to autonomous driving, by providing insights into model behavior and enabling more reliable and verifiable predictions.

Papers