Baseline Policy

Baseline policy research focuses on improving upon existing policies, ensuring that any learned policy performs at least as well as, and ideally better than, a pre-existing baseline. Current research emphasizes offline reinforcement learning methods, often combining imitation learning with model-based approaches or employing techniques like relative pessimism to guarantee safety and avoid performance degradation during training. These advancements are crucial for real-world applications where maintaining a minimum performance level is paramount, such as in robotics and compiler optimization, enabling safer and more reliable deployment of learned policies.

Papers