High Explainability
High explainability in artificial intelligence (AI) aims to make the decision-making processes of complex models, such as large language models and deep neural networks, more transparent and understandable. Current research focuses on developing both intrinsic (built-in) and post-hoc (added after training) explainability methods, often employing techniques like attention mechanisms, feature attribution, and counterfactual examples to interpret model outputs across various modalities (text, images, audio). This pursuit is crucial for building trust in AI systems, particularly in high-stakes domains like medicine and finance, and for ensuring fairness, accountability, and responsible AI development.
Papers
Large Vision-Language Model Alignment and Misalignment: A Survey Through the Lens of Explainability
Dong Shu, Haiyan Zhao, Jingyu Hu, Weiru Liu, Lu Cheng, Mengnan Du
Multi-Head Explainer: A General Framework to Improve Explainability in CNNs and Transformers
Bohang Sun, Pietro Liò
Explainable Neural Networks with Guarantees: A Sparse Estimation Approach
Antoine Ledent, Peng Liu
A redescription mining framework for post-hoc explaining and relating deep learning models
Matej Mihelčić, Ivan Grubišić, Miha Keber
Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG
Hasan Md Tusfiqur Alam, Devansh Srivastav, Md Abdul Kadir, Daniel Sonntag
Post-hoc Interpretability Illumination for Scientific Interaction Discovery
Ling Zhang, Zhichao Hou, Tingxiang Ji, Yuanyuan Xu, Runze Li