Trajectory Preference
Trajectory preference research focuses on aligning reinforcement learning (RL) agents' behavior with human preferences, expressed through comparisons of entire agent trajectories rather than instantaneous rewards. Current research emphasizes efficient exploration and learning from limited human feedback, employing methods like dynamic policy fusion, preference-guided policy optimization, and inverse reinforcement learning to infer underlying reward functions from trajectory preferences. This field is crucial for developing safe and user-friendly RL agents in complex applications, such as autonomous driving and robotics, where designing explicit reward functions is difficult or impossible.
Papers
September 30, 2024
July 9, 2024
June 28, 2024
May 30, 2024
November 25, 2023
July 29, 2023
January 25, 2023
May 23, 2022
December 7, 2021