Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both [2410.08458]