Paper ID: 2304.11460

Reinforcement Learning with an Abrupt Model Change

Wuxia Chen, Taposh Banerjee, Jemin George, Carl Busart

The problem of reinforcement learning is considered where the environment or the model undergoes a change. An algorithm is proposed that an agent can apply in such a problem to achieve the optimal long-time discounted reward. The algorithm is model-free and learns the optimal policy by interacting with the environment. It is shown that the proposed algorithm has strong optimality properties. The effectiveness of the algorithm is also demonstrated using simulation results. The proposed algorithm exploits a fundamental reward-detection trade-off present in these problems and uses a quickest change detection algorithm to detect the model change. Recommendations are provided for faster detection of model changes and for smart initialization strategies.

Submitted: Apr 22, 2023

Topics

Reinforcement Learning
Optimal Policy
Reward Function
Optimal Timing
Model Shift
Optimality Criterion

Links

arXiv PDF