Exploring and Addressing Reward Confusion in Offline Preference Learning [2407.16025]