Offline Policy Learning

Offline policy learning aims to train reinforcement learning agents using pre-collected datasets, avoiding the risks and costs of online interaction. Current research focuses on addressing challenges like data scarcity, distributional shifts, and the presence of suboptimal behaviors in the training data, employing techniques such as importance weighting regularization, skill-based abstractions, deep generative models, and dataset clustering to improve policy performance. These advancements are significant for applications where online learning is impractical or unsafe, such as healthcare, robotics, and safety-critical systems, enabling the development of robust and effective policies from existing data.

Papers