Response Pair

Response pairs, comprising two or more AI-generated answers to the same prompt, are central to improving large language model (LLM) alignment with human preferences. Current research focuses on efficiently selecting and utilizing these pairs for training, employing techniques like contrastive learning and reinforcement learning from human feedback (RLHF), often incorporating novel loss functions to leverage preference strength information. This work aims to enhance LLM performance and safety by optimizing the training process, reducing annotation costs, and mitigating issues like hallucinations and bias, ultimately leading to more helpful and reliable AI systems.

Papers