Intermediate direct preference optimization [2408.02923]