Preference Based Reinforcement Learning

Preference-based reinforcement learning (PbRL) aims to train agents by learning from human preferences over different behaviors, rather than relying on explicitly engineered reward functions. Current research focuses on improving the efficiency and robustness of PbRL, exploring techniques like multimodal transformers for richer preference modeling, incorporating equal preferences and skill-driven learning, and addressing noisy or limited feedback through methods such as dynamic sparsity and self-training. This approach holds significant promise for real-world applications where designing accurate reward functions is difficult or impossible, particularly in robotics and human-computer interaction, by enabling more natural and intuitive agent training.

Papers