Decoding Time

Decoding-time alignment aims to modify large language model (LLM) outputs during the generation process, rather than through pre-training or fine-tuning, to better align with user preferences or safety constraints. Current research focuses on developing algorithms that leverage reward models to guide the decoding process, employing techniques like personalized reward modeling, comparator-driven methods, and reward-guided search. This approach offers a more efficient and adaptable way to address issues like factuality, helpfulness, and safety in LLMs, potentially leading to more reliable and user-friendly AI systems.

Papers