Decoding Time
Decoding-time alignment aims to modify large language model (LLM) outputs during the generation process, rather than through pre-training or fine-tuning, to better align with user preferences or safety constraints. Current research focuses on developing algorithms that leverage reward models to guide the decoding process, employing techniques like personalized reward modeling, comparator-driven methods, and reward-guided search. This approach offers a more efficient and adaptable way to address issues like factuality, helpfulness, and safety in LLMs, potentially leading to more reliable and user-friendly AI systems.
Papers
Decoding-time Realignment of Language Models
Tianlin Liu, Shangmin Guo, Leonardo Bianco, Daniele Calandriello, Quentin Berthet, Felipe Llinares, Jessica Hoffmann, Lucas Dixon, Michal Valko, Mathieu Blondel
DeAL: Decoding-time Alignment for Large Language Models
James Y. Huang, Sailik Sengupta, Daniele Bonadiman, Yi-an Lai, Arshit Gupta, Nikolaos Pappas, Saab Mansour, Katrin Kirchhoff, Dan Roth