Explainable Reinforcement Learning
Explainable Reinforcement Learning (XRL) aims to make the decision-making processes of reinforcement learning (RL) agents more transparent and understandable. Current research focuses on developing methods that provide both local (explaining single actions) and global (explaining overall behavior) explanations, often employing techniques like reward decomposition, counterfactual analysis, and interpretable model architectures such as decision trees. This work is crucial for building trust in RL systems, particularly in high-stakes applications like healthcare and finance, and for facilitating debugging and improved human-agent collaboration.
Papers
Explainable Reinforcement Learning via Temporal Policy Decomposition
Franco Ruggeri, Alessio Russo, Rafia Inam, Karl Henrik Johansson
Explainable Reinforcement Learning for Formula One Race Strategy
Devin Thomas, Junqi Jiang, Avinash Kori, Aaron Russo, Steffen Winkler, Stuart Sale, Joseph McMillan, Francesco Belardinelli, Antonio Rago