Trajectory Wise Reward

Trajectory-wise reward in reinforcement learning (RL) focuses on scenarios where an agent receives a single reward for an entire sequence of actions, rather than for individual steps. Current research emphasizes developing efficient algorithms that can effectively learn from this type of feedback, often employing model-based approaches, GFlowNet architectures, or methods that decompose the trajectory reward into per-step proxies. This research area is significant because it addresses the limitations of traditional RL in settings with delayed or episodic rewards, improving the applicability of RL to real-world problems where immediate feedback is unavailable or impractical.

Papers