Human Feedback
Human feedback is crucial for aligning artificial intelligence models, particularly large language models, with human preferences and values. Current research focuses on improving the efficiency and reliability of incorporating human feedback into reinforcement learning frameworks, exploring techniques like macro actions, active learning, and reward model optimization to address challenges such as the cost and subjectivity of human judgments. This work is significant because it directly impacts the safety, trustworthiness, and overall effectiveness of AI systems across diverse applications, from autonomous driving to educational assessment. The development of more robust and efficient methods for integrating human feedback is a key area of ongoing investigation.
Papers
Trustworthy Human-AI Collaboration: Reinforcement Learning with Human Feedback and Physics Knowledge for Safe Autonomous Driving
Zilin Huang, Zihao Sheng, Lei Shi, Sikai Chen
The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs
Bocheng Chen, Hanqing Guo, Guangjing Wang, Yuanda Wang, Qiben Yan
Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques
Natalia Zhang, Xinqi Wang, Qiwen Cui, Runlong Zhou, Sham M. Kakade, Simon S. Du
CoGen: Learning from Feedback with Coupled Comprehension and Generation
Mustafa Omer Gul, Yoav Artzi
WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback
Taiwei Shi, Zhuoer Wang, Longqi Yang, Ying-Chun Lin, Zexue He, Mengting Wan, Pei Zhou, Sujay Jauhar, Xiaofeng Xu, Xia Song, Jennifer Neville