Explainable Reinforcement Learning

Explainable Reinforcement Learning (XRL) aims to make the decision-making processes of reinforcement learning (RL) agents more transparent and understandable. Current research focuses on developing methods that provide both local (explaining single actions) and global (explaining overall behavior) explanations, often employing techniques like reward decomposition, counterfactual analysis, and interpretable model architectures such as decision trees. This work is crucial for building trust in RL systems, particularly in high-stakes applications like healthcare and finance, and for facilitating debugging and improved human-agent collaboration.

Papers