Paper ID: 2206.03328
Concentration bounds for SSP Q-learning for average cost MDPs
Shaan Ul Haque, Vivek Borkar
We derive a concentration bound for a Q-learning algorithm for average cost Markov decision processes based on an equivalent shortest path problem, and compare it numerically with the alternative scheme based on relative value iteration.
Submitted: Jun 7, 2022