Inherent Interpretability
Inherent interpretability in machine learning focuses on designing models and methods that are inherently transparent and understandable, aiming to reduce the "black box" nature of many AI systems. Current research emphasizes developing intrinsically interpretable model architectures, such as those based on decision trees, rule-based systems, and specific neural network designs (e.g., Kolmogorov-Arnold Networks), alongside techniques like feature attribution and visualization methods to enhance understanding of model behavior. This pursuit is crucial for building trust in AI, particularly in high-stakes applications like healthcare and finance, where understanding model decisions is paramount for responsible deployment and effective human-AI collaboration.
Papers
Self-Reinforcement Attention Mechanism For Tabular Learning
Kodjo Mawuena Amekoe, Mohamed Djallel Dilmi, Hanene Azzag, Mustapha Lebbah, Zaineb Chelly Dagdia, Gregoire Jaffre
Curve Your Enthusiasm: Concurvity Regularization in Differentiable Generalized Additive Models
Julien Siems, Konstantin Ditschuneit, Winfried Ripken, Alma Lindborg, Maximilian Schambach, Johannes S. Otterbach, Martin Genzel
Tackling Interpretability in Audio Classification Networks with Non-negative Matrix Factorization
Jayneel Parekh, Sanjeel Parekh, Pavlo Mozharovskyi, Gaël Richard, Florence d'Alché-Buc
COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable ELements for explaining neural net classifiers on NLP tasks
Fanny Jourdan, Agustin Picard, Thomas Fel, Laurent Risser, Jean Michel Loubes, Nicholas Asher
SkelEx and BoundEx: Natural Visualization of ReLU Neural Networks
Pawel Pukowski, Haiping Lu
Towards the Characterization of Representations Learned via Capsule-based Network Architectures
Saja AL-Tawalbeh, José Oramas
When a CBR in Hand is Better than Twins in the Bush
Mobyen Uddin Ahmed, Shaibal Barua, Shahina Begum, Mir Riyanul Islam, Rosina O Weber