Non Markovian Reward

Non-Markovian reward modeling in reinforcement learning addresses the challenge of learning optimal policies when rewards depend on the entire history of states and actions, rather than just the current state. Current research focuses on developing algorithms and model architectures, such as reward machines, transformers, and multiple instance learning methods, to effectively capture these temporal dependencies and learn from various feedback types, including trajectory preferences and bagged rewards. This research is significant because it enables the application of reinforcement learning to complex real-world scenarios with inherently non-Markovian reward structures, improving the performance and robustness of AI agents in diverse domains. The development of efficient algorithms with theoretical guarantees is a key focus, alongside the exploration of interpretable models for better understanding and verification of learned policies.

Papers