Trajectory Wise Reward
Trajectory-wise reward in reinforcement learning (RL) focuses on scenarios where an agent receives a single reward for an entire sequence of actions, rather than for individual steps. Current research emphasizes developing efficient algorithms that can effectively learn from this type of feedback, often employing model-based approaches, GFlowNet architectures, or methods that decompose the trajectory reward into per-step proxies. This research area is significant because it addresses the limitations of traditional RL in settings with delayed or episodic rewards, improving the applicability of RL to real-world problems where immediate feedback is unavailable or impractical.
Papers
October 19, 2024
October 1, 2024
August 16, 2024
June 30, 2024
June 11, 2024
May 13, 2024
April 16, 2024
February 4, 2024
December 27, 2023
December 17, 2023
November 27, 2023
July 10, 2023
June 30, 2022
June 13, 2022
December 7, 2021