Paper ID: 2407.06712 • Published Jul 9, 2024
MDP Geometry, Normalization and Reward Balancing Solvers
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
We present a new geometric interpretation of Markov Decision Processes (MDPs)
with a natural normalization procedure that allows us to adjust the value
function at each state without altering the advantage of any action with
respect to any policy. This advantage-preserving transformation of the MDP
motivates a class of algorithms which we call Reward Balancing, which solve
MDPs by iterating through these transformations, until an approximately optimal
policy can be trivially found. We provide a convergence analysis of several
algorithms in this class, in particular showing that for MDPs for unknown
transition probabilities we can improve upon state-of-the-art sample complexity
results.