Better Explainability

Improving the explainability of machine learning models aims to make their decision-making processes more transparent and understandable, fostering trust and enabling better model debugging and refinement. Current research focuses on developing novel explanation methods, including those based on feature attribution, counterfactual examples, and structured argumentation, often applied to deep neural networks, transformers, and reinforcement learning agents. These advancements are crucial for deploying AI systems responsibly in high-stakes domains like healthcare and autonomous systems, where understanding model behavior is paramount.

Papers