Preference Fine Tuning

Preference fine-tuning aims to align large language models (LLMs) and other deep generative models with human preferences, improving their safety, utility, and overall performance on various tasks. Current research focuses on developing and comparing different algorithms, such as reinforcement learning (e.g., PPO, DDPG) and contrastive learning (e.g., DPO), often incorporating human feedback or AI-generated feedback to guide the tuning process. This field is crucial for mitigating biases, enhancing model safety, and improving the overall user experience in applications ranging from chatbots to personalized recommendations, driving significant advancements in trustworthy AI.

Papers