Human Feedback
Human feedback is crucial for aligning artificial intelligence models, particularly large language models, with human preferences and values. Current research focuses on improving the efficiency and reliability of incorporating human feedback into reinforcement learning frameworks, exploring techniques like macro actions, active learning, and reward model optimization to address challenges such as the cost and subjectivity of human judgments. This work is significant because it directly impacts the safety, trustworthiness, and overall effectiveness of AI systems across diverse applications, from autonomous driving to educational assessment. The development of more robust and efficient methods for integrating human feedback is a key area of ongoing investigation.
Papers
Iterative Reward Shaping using Human Feedback for Correcting Reward Misspecification
Jasmina Gajcin, James McCarthy, Rahul Nair, Radu Marinescu, Elizabeth Daly, Ivana Dusparic
Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models
Hritik Bansal, John Dang, Aditya Grover
Shepherd: A Critic for Language Model Generation
Tianlu Wang, Ping Yu, Xiaoqing Ellen Tan, Sean O'Brien, Ramakanth Pasunuru, Jane Dwivedi-Yu, Olga Golovneva, Luke Zettlemoyer, Maryam Fazel-Zarandi, Asli Celikyilmaz
RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback
Yannick Metz, David Lindner, Raphaël Baur, Daniel Keim, Mennatallah El-Assady
FeedbackLogs: Recording and Incorporating Stakeholder Feedback into Machine Learning Pipelines
Matthew Barker, Emma Kallina, Dhananjay Ashok, Katherine M. Collins, Ashley Casovan, Adrian Weller, Ameet Talwalkar, Valerie Chen, Umang Bhatt
Teach Me How to Improve My Argumentation Skills: A Survey on Feedback in Argumentation
Camélia Guerraoui, Paul Reisert, Naoya Inoue, Farjana Sultana Mim, Shoichi Naito, Jungmin Choi, Irfan Robbani, Wenzhi Wang, Kentaro Inui