Reward Function
Reward functions, crucial for guiding reinforcement learning agents towards desired behaviors, are the focus of intense research. Current efforts center on automatically learning reward functions from diverse sources like human preferences, demonstrations (including imperfect ones), and natural language descriptions, often employing techniques like inverse reinforcement learning, large language models, and Bayesian optimization within various architectures including transformers and generative models. This research is vital for improving the efficiency and robustness of reinforcement learning, enabling its application to complex real-world problems where manually designing reward functions is impractical or impossible. The ultimate goal is to create more adaptable and human-aligned AI systems.
Papers
Fantastic LLMs for Preference Data Annotation and How to (not) Find Them
Guangxuan Xu, Kai Xu, Shivchander Sudalairaj, Hao Wang, Akash Srivastava
Show, Don't Tell: Learning Reward Machines from Demonstrations for Reinforcement Learning-Based Cardiac Pacemaker Synthesis
John Komp, Dananjay Srinivas, Maria Pacheco, Ashutosh Trivedi
Navigating Noisy Feedback: Enhancing Reinforcement Learning with Error-Prone Language Models
Muhan Lin, Shuyang Shi, Yue Guo, Behdad Chalaki, Vaishnav Tadiparthi, Ehsan Moradi Pari, Simon Stepputtis, Joseph Campbell, Katia Sycara
Few-shot In-Context Preference Learning Using Large Language Models
Chao Yu, Hong Lu, Jiaxuan Gao, Qixin Tan, Xinting Yang, Yu Wang, Yi Wu, Eugene Vinitsky
Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards
Alexander G. Padula, Dennis J.N.J. Soemers
Sample-Efficient Curriculum Reinforcement Learning for Complex Reward Functions
Kilian Freitag, Kristian Ceder, Rita Laezza, Knut Åkesson, Morteza Haghir Chehreghani