Human Feedback
Human feedback is crucial for aligning artificial intelligence models, particularly large language models, with human preferences and values. Current research focuses on improving the efficiency and reliability of incorporating human feedback into reinforcement learning frameworks, exploring techniques like macro actions, active learning, and reward model optimization to address challenges such as the cost and subjectivity of human judgments. This work is significant because it directly impacts the safety, trustworthiness, and overall effectiveness of AI systems across diverse applications, from autonomous driving to educational assessment. The development of more robust and efficient methods for integrating human feedback is a key area of ongoing investigation.
Papers
Data-Efficient Alignment of Large Language Models with Human Feedback Through Natural Language
Di Jin, Shikib Mehri, Devamanyu Hazarika, Aishwarya Padmakumar, Sungjin Lee, Yang Liu, Mahdi Namazifar
Reinforcement Learning from Statistical Feedback: the Journey from AB Testing to ANT Testing
Feiyang Han, Yimin Wei, Zhaofeng Liu, Yanxing Qi
When Large Language Models contradict humans? Large Language Models' Sycophantic Behaviour
Leonardo Ranaldi, Giulia Pucci
Neural machine translation for automated feedback on children's early-stage writing
Jonas Vestergaard Jensen, Mikkel Jordahn, Michael Riis Andersen
Aligning Neural Machine Translation Models: Human Feedback in Training and Inference
Miguel Moura Ramos, Patrick Fernandes, António Farinhas, André F. T. Martins
SuperHF: Supervised Iterative Learning from Human Feedback
Gabriel Mukobi, Peter Chatain, Su Fong, Robert Windesheim, Gitta Kutyniok, Kush Bhatia, Silas Alberti
BabyStories: Can Reinforcement Learning Teach Baby Language Models to Write Better Stories?
Xingmeng Zhao, Tongnian Wang, Sheri Osborn, Anthony Rios
AI-enhanced Auto-correction of Programming Exercises: How Effective is GPT-3.5?
Imen Azaiz, Oliver Deckarm, Sven Strickroth
ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into Principles
Savvas Petridis, Ben Wedin, James Wexler, Aaron Donsbach, Mahima Pushkarna, Nitesh Goyal, Carrie J. Cai, Michael Terry