Offline Deep Reinforcement Learning

Offline deep reinforcement learning (offline DRL) aims to train reinforcement learning agents using pre-collected datasets, eliminating the need for costly and potentially risky real-world interaction. Current research emphasizes improving the robustness and sample efficiency of algorithms like Conservative Q-Learning (CQL) and Advantage Weighted Actor-Critic (AWAC), often focusing on techniques like pre-training with synthetic data and careful policy selection methods to mitigate overfitting and improve generalization. This field is significant because it enables the application of DRL to safety-critical domains and facilitates more standardized and reproducible research through the development of comprehensive benchmark datasets and open-source libraries.

Papers