Inherent Interpretability

Inherent interpretability in machine learning focuses on designing models and methods that are inherently transparent and understandable, aiming to reduce the "black box" nature of many AI systems. Current research emphasizes developing intrinsically interpretable model architectures, such as those based on decision trees, rule-based systems, and specific neural network designs (e.g., Kolmogorov-Arnold Networks), alongside techniques like feature attribution and visualization methods to enhance understanding of model behavior. This pursuit is crucial for building trust in AI, particularly in high-stakes applications like healthcare and finance, where understanding model decisions is paramount for responsible deployment and effective human-AI collaboration.

591papers

Papers - Page 5

February 9, 2025

A New Hybrid Intelligent Approach for Multimodal Detection of Suspected Disinformation on TikTok
Jared D.T. Guerrero-Sosa, Andres Montoro-Montarroso, Francisco P. Romero, Jesus Serrano-Guerrero, Jose A. Olivas
TikTok Video Disinformation Content Hybrid Intelligence Inherent Interpretability Generate Hard to Detect Disinformation Misinformation Campaign Multi Modal Multimodal Misinformation Suspect Model

February 5, 2025

xai_evals : A Framework for Evaluating Post-Hoc Local Explanation Methods
Pratinav Seth, Yashwardhan Rathore, Neeraj Kumar Singh, Chintan Chitroda, Vinay Kumar Sankarapu
Inherent Interpretability NCD Method Machine Learning Model New Framework XAI Evaluation Post Hoc Explanation

February 4, 2025

CVKAN: Complex-Valued Kolmogorov-Arnold Networks
Matthias Wolff, Florian Eilers, Xiaoyi Jiang
C2p GCN Complex Valued Neural Network Kolmogorov Arnold Network Inherent Interpretability Complex Valued

February 3, 2025

Discovering Chunks in Neural Embeddings for Interpretability
Shuchen Wu, Stephan Alaniz, Eric Schulz, Zeynep Akata
Feature Embeddings Human Cognition Chunk Wise Inherent Interpretability Recurrent Neural Network

February 2, 2025

Perspectives for Direct Interpretability in Multi-Agent Deep Reinforcement Learning
Yoann Poupart, Aurélie Beynier, Nicolas Maudet
Single Agent Inherent Interpretability Reinforcement Learning Synthesized View Multi Agent Deep Reinforcement Learning Interpretable Model Multi Agent

January 31, 2025

January 30, 2025

January 29, 2025

A Genetic Algorithm-Based Approach for Automated Optimization of Kolmogorov-Arnold Networks in Classification Tasks
Quan Long, Bin Wang, Bing Xue, Mengjie Zhang
Practical Algorithm Inherent Interpretability Evolutionary Computation Sparse Connectivity Multi Layer Kolmogorov Arnold Network Classification Task

January 28, 2025

January 27, 2025

Propositional Interpretability in Artificial Intelligence
David J. Chalmers
Interpretable Logical Propositional Logic Abstract Interpretation Artificial Intelligence Inherent Interpretability Mechanistic Interpretability

January 26, 2025

January 24, 2025

Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition
Dan Braun, Lucius Bushnaq, Stefan Heimersheim, Jake Mendel, Lee Sharkey
Learning Decomposition Inherent Interpretability Mechanistic Interpretability Crown Rump Length Neural Network Parameter Parameter Space

January 23, 2025

SAFR: Neuron Redistribution for Interpretability
Ruidi Chang, Chunyuan Deng, Hanjie Chen
Inherent Interpretability Revisit Monosemanticity Decouple Re Identification Transformer Model First Order Superposition Interpretable Transformer Neuron Alignment

January 21, 2025

Leveraging Large Language Models to Enhance Machine Learning Interpretability and Predictive Performance: A Case Study on Emergency Department Returns for Mental Health Patients
Abdulaziz Ahmed, Mohammad Saleem, Mohammed Alzeen, Badari Birur, Rachel E Fargason, Bradley G Burk, Hannah Rose Harkins+2
Predictive Performance Model Interpretability Mental Health Stock Return Prediction Inherent Interpretability

Inherent Interpretability

Papers - Page 5

A New Hybrid Intelligent Approach for Multimodal Detection of Suspected Disinformation on TikTok

xai_evals : A Framework for Evaluating Post-Hoc Local Explanation Methods

CVKAN: Complex-Valued Kolmogorov-Arnold Networks

Discovering Chunks in Neural Embeddings for Interpretability

Perspectives for Direct Interpretability in Multi-Agent Deep Reinforcement Learning

Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution

Transcoders Beat Sparse Autoencoders for Interpretability

MolGraph-xLSTM: A graph-based dual-level xLSTM framework with multi-head mixture-of-experts for enhanced molecular representation and interpretability

Enhancing Large Language Model Efficiencyvia Symbolic Compression: A Formal Approach Towards Interpretability

Large Language Models for Cryptocurrency Transaction Analysis: A Bitcoin Case Study

A Genetic Algorithm-Based Approach for Automated Optimization of Kolmogorov-Arnold Networks in Classification Tasks

SIC: Similarity-Based Interpretable Image Classification with Neural Networks

Induced Modularity and Community Detection for Functionally Interpretable Reinforcement Learning

Improving Interpretability and Accuracy in Neuro-Symbolic Rule Extraction Using Class-Specific Sparse Filters

Propositional Interpretability in Artificial Intelligence

A Comprehensive Survey on Self-Interpretable Neural Networks

Making Sense Of Distributed Representations With Activation Spectroscopy

Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition

SAFR: Neuron Redistribution for Interpretability

Leveraging Large Language Models to Enhance Machine Learning Interpretability and Predictive Performance: A Case Study on Emergency Department Returns for Mental Health Patients