Zero Shot Reward

Zero-shot reward learning aims to train reinforcement learning agents without explicitly defining a reward function, instead leveraging pre-trained large language models or vision-language models to provide reward signals based on natural language descriptions of desired behaviors. Current research focuses on adapting these large models, often using techniques like prompt engineering or contrasting desired and undesired behaviors, to generate effective zero-shot reward signals for diverse tasks, including robotics, autonomous driving, and text generation. This approach promises to significantly reduce the cost and effort associated with designing reward functions, accelerating the development and deployment of reinforcement learning agents across various applications.

Papers