Non Markovian Safety
Non-Markovian safety in reinforcement learning addresses the challenge of ensuring safe agent behavior when safety constraints depend on the entire history of states and actions, rather than just the current state. Current research focuses on learning models that capture these temporal dependencies, often using supervised learning to associate safety labels with state-action trajectories and incorporating these learned models into reinforcement learning algorithms through techniques like dual optimization or RL-as-inference. This work is crucial for deploying reinforcement learning agents in real-world scenarios where incomplete state representations or complex, delayed consequences necessitate considering non-Markovian safety specifications to prevent unintended negative side effects.