Offline Preference Based Reinforcement Learning
Offline preference-based reinforcement learning (PbRL) aims to train reinforcement learning agents using only pre-collected data and human preferences between different action sequences, eliminating the need for manually designed reward functions or online interaction. Current research focuses on improving reward estimation from these preferences, exploring methods that leverage higher-order preference information and incorporate hindsight or future outcomes to better capture human intent. This approach is significant because it addresses the challenges of reward specification in complex real-world scenarios, potentially enabling broader application of RL in domains where precise reward functions are difficult or impossible to define.
Papers
August 8, 2024
July 5, 2024
June 26, 2024
June 14, 2024
December 30, 2023
May 25, 2023