Return Decomposition

Return decomposition in reinforcement learning addresses the challenge of assigning credit to individual actions within episodes where rewards are only received at the end. Current research focuses on developing algorithms that effectively decompose the final reward into step-wise proxy rewards, often employing techniques like attention mechanisms or Shapley value estimations to handle complex temporal and multi-agent interactions. These methods aim to improve sample efficiency and learning performance in scenarios with delayed or sparse rewards, particularly relevant for real-world applications with long horizons. Improved return decomposition promises to enhance the applicability of reinforcement learning to complex, real-world problems.

Papers