Inherent Interpretability

Inherent interpretability in machine learning focuses on designing models and methods that are inherently transparent and understandable, aiming to reduce the "black box" nature of many AI systems. Current research emphasizes developing intrinsically interpretable model architectures, such as those based on decision trees, rule-based systems, and specific neural network designs (e.g., Kolmogorov-Arnold Networks), alongside techniques like feature attribution and visualization methods to enhance understanding of model behavior. This pursuit is crucial for building trust in AI, particularly in high-stakes applications like healthcare and finance, where understanding model decisions is paramount for responsible deployment and effective human-AI collaboration.

591papers

Papers - Page 2

March 16, 2025

March 15, 2025

Integration of Explainable AI Techniques with Large Language Models for Enhanced Interpretability for Sentiment Analysis
Sentiment Analysis Inherent Interpretability Sentiment Prediction

March 13, 2025

Fewer Than 1% of Explainable AI Papers Validate Explainability with Humans
High Explainability XAI Research Real Human Inherent Interpretability Better Explainability Explainable AI

March 11, 2025

Evaluating Interpretable Reinforcement Learning by Distilling Policies into Programs
Inherent Interpretability Reinforcement Learning Past Present Temporal Program Effective Distillation

March 10, 2025

Demystifying the Accuracy-Interpretability Trade-Off: A Case Study of Inferring Ratings from Reviews
Inherent Interpretability Human Rating Interpretable Model Interpretable Machine Learning Prominent Review Case Study Built in Interpretability

March 8, 2025

Exploring Interpretability for Visual Prompt Tuning with Hierarchical Concepts
Part Whole Hierarchy Visual Prompt Tuning Prompt Embeddings Concept Identification Inherent Interpretability

March 7, 2025

Towards Locally Explaining Prediction Behavior via Gradual Interventions and Measuring Property Gradients
Local Explanation Inherent Interpretability Global Explanation Gradient Tracking

March 6, 2025

March 5, 2025

March 4, 2025

March 3, 2025

Inherent Interpretability

Papers - Page 2

HyConEx: Hypernetwork classifier with counterfactual explanations

Fuzzy Rule-based Differentiable Representation Learning

Integration of Explainable AI Techniques with Large Language Models for Enhanced Interpretability for Sentiment Analysis

Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction

Unifying Perplexing Behaviors in Modified BP Attributions through Alignment Perspective

Quantifying Interpretability in CLIP Models with Concept Consistency

Fewer Than 1% of Explainable AI Papers Validate Explainability with Humans

Evaluating Interpretable Reinforcement Learning by Distilling Policies into Programs

Demystifying the Accuracy-Interpretability Trade-Off: A Case Study of Inferring Ratings from Reviews

Exploring Interpretability for Visual Prompt Tuning with Hierarchical Concepts

Towards Locally Explaining Prediction Behavior via Gradual Interventions and Measuring Property Gradients

A Unified Framework with Novel Metrics for Evaluating the Effectiveness of XAI Techniques in LLMs

Spatial regularisation for improved accuracy and interpretability in keypoint-based registration

VirtualXAI: A User-Centric Framework for Explainability Assessment Leveraging GPT-Generated Personas

Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders

Conceptualizing Uncertainty

Classifying States of the Hopfield Network with Improved Accuracy, Generalization, and Interpretability

AxBERT: An Interpretable Chinese Spelling Correction Method Driven by Associative Knowledge Network

Superscopes: Amplifying Internal Feature Representations for Language Model Interpretation

Mathematical Foundation of Interpretable Equivariant Surrogate Models