Preference Based Reinforcement Learning
Preference-based reinforcement learning (PbRL) aims to train agents by learning from human preferences over different behaviors, rather than relying on explicitly engineered reward functions. Current research focuses on improving the efficiency and robustness of PbRL, exploring techniques like multimodal transformers for richer preference modeling, incorporating equal preferences and skill-driven learning, and addressing noisy or limited feedback through methods such as dynamic sparsity and self-training. This approach holds significant promise for real-world applications where designing accurate reward functions is difficult or impossible, particularly in robotics and human-computer interaction, by enabling more natural and intuitive agent training.
Papers
Exploiting Unlabeled Data for Feedback Efficient Human Preference based Reinforcement Learning
Mudit Verma, Siddhant Bhambri, Subbarao Kambhampati
A State Augmentation based approach to Reinforcement Learning from Human Preferences
Mudit Verma, Subbarao Kambhampati
Data Driven Reward Initialization for Preference based Reinforcement Learning
Mudit Verma, Subbarao Kambhampati