Fitted Q Iteration
Fitted Q-Iteration (FQI) is an offline reinforcement learning algorithm aiming to estimate optimal action-value functions (Q-functions) iteratively from a fixed dataset, thereby learning optimal policies without further interaction with the environment. Current research focuses on improving FQI's sample efficiency and robustness through various techniques, including employing max-plus linear approximators, alternative loss functions (like log-loss), and incorporating structural assumptions like reward-relevance filtering or action impact regularity to reduce the complexity of the problem. These advancements enhance FQI's applicability to real-world problems where online data collection is expensive or impossible, particularly in domains like healthcare and economics, by providing theoretical guarantees and improved empirical performance.