Policy Switching

Policy switching in reinforcement learning (RL) focuses on optimizing the balance between exploiting a current policy and exploring potentially better alternatives, considering the inherent costs associated with transitions. Current research investigates efficient algorithms, such as those leveraging optimal transport or meta-learning, to manage these transitions, often within frameworks like distributionally robust Markov decision processes or by employing techniques like cluster randomization to mitigate the impact of interference. This research is crucial for improving the robustness and efficiency of RL agents in dynamic environments, with applications ranging from safe robotics to online platform experimentation.

Papers