Personalized Alignment

Personalized alignment in large language models (LLMs) focuses on tailoring model outputs to individual user preferences, addressing the limitations of aligning LLMs to general, aggregate preferences. Current research explores methods like post-hoc reward modeling during decoding, interactive alignment through multi-turn conversations, and base-model anchored optimization to minimize knowledge loss during personalization. This field is crucial for ensuring safe and beneficial LLM applications by enabling customized experiences while mitigating risks associated with diverse and potentially conflicting individual preferences.

Papers