Offline Preference Based Reinforcement Learning

Offline preference-based reinforcement learning (PbRL) aims to train reinforcement learning agents using only pre-collected data and human preferences between different action sequences, eliminating the need for manually designed reward functions or online interaction. Current research focuses on improving reward estimation from these preferences, exploring methods that leverage higher-order preference information and incorporate hindsight or future outcomes to better capture human intent. This approach is significant because it addresses the challenges of reward specification in complex real-world scenarios, potentially enabling broader application of RL in domains where precise reward functions are difficult or impossible to define.

Papers