Inherent Interpretability
Inherent interpretability in machine learning focuses on designing models and methods that are inherently transparent and understandable, aiming to reduce the "black box" nature of many AI systems. Current research emphasizes developing intrinsically interpretable model architectures, such as those based on decision trees, rule-based systems, and specific neural network designs (e.g., Kolmogorov-Arnold Networks), alongside techniques like feature attribution and visualization methods to enhance understanding of model behavior. This pursuit is crucial for building trust in AI, particularly in high-stakes applications like healthcare and finance, where understanding model decisions is paramount for responsible deployment and effective human-AI collaboration.
Papers - Page 21
Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention
Zhen Tan, Tianlong Chen, Zhenyu Zhang, Huan LiuSI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology
Saarthak Kapse, Pushpak Pati, Srijan Das, Jingwei Zhang, Chao Chen, Maria Vakalopoulou, Joel Saltz, Dimitris Samaras, Rajarsi R. Gupta+1SoK: Taming the Triangle -- On the Interplays between Fairness, Interpretability and Privacy in Machine Learning
Julien Ferry, Ulrich Aïvodji, Sébastien Gambs, Marie-José Huguet, Mohamed Siala
Large Language Models in Medical Term Classification and Unexpected Misalignment Between Response and Reasoning
Xiaodan Zhang, Sandeep Vemulapalli, Nabasmita Talukdar, Sumyeong Ahn, Jiankun Wang, Han Meng, Sardar Mehtab Bin Murtaza+7Probabilistic Prediction of Longitudinal Trajectory Considering Driving Heterogeneity with Interpretability
Shuli Wang, Kun Gao, Lanfang Zhang, Yang Liu, Lei Chen
Benchmarking and Enhancing Disentanglement in Concept-Residual Models
Renos Zabounidis, Ini Oguntola, Konghao Zhao, Joseph Campbell, Simon Stepputtis, Katia SycaraCLIP-QDA: An Explainable Concept Bottleneck Model
Rémi Kazmierczak, Eloïse Berthier, Goran Frehse, Gianni FranchiA data-science pipeline to enable the Interpretability of Many-Objective Feature Selection
Uchechukwu F. Njoku, Alberto Abelló, Besim Bilalli, Gianluca Bontempi
Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching
Aleksandar Makelov, Georg Lange, Neel NandaXAI for time-series classification leveraging image highlight methods
Georgios Makridis, Georgios Fatouros, Vasileios Koukos, Dimitrios Kotios, Dimosthenis Kyriazis, Ioannis Soldatos