Preference Optimization

Preference optimization (PO) aims to align large language models (LLMs) and other AI systems with human preferences, improving their behavior and outputs. Current research focuses on refining existing algorithms like Direct Preference Optimization (DPO) and its variants, exploring techniques such as sparse token weighting, bidirectional feedback, and incorporating uncertainty estimates to improve efficiency and robustness. This field is crucial for building safer and more beneficial AI systems, impacting both the development of more reliable models and the ethical considerations surrounding their deployment.

Papers