Sparse Rewards Can Self-Train Dialogue Agents [2409.04617]