TD Learning

Temporal difference (TD) learning is a reinforcement learning method aiming to efficiently estimate value functions by bootstrapping from current estimates, balancing bias and variance. Current research focuses on improving TD learning's efficiency and stability, particularly in off-policy settings and under distribution shifts, exploring algorithms like GTD, TDC, and QTD, as well as incorporating techniques such as bootstrapping, chunking, and control variates. These advancements address challenges like slow convergence, instability with linear function approximation, and the impact of heterogeneous data in federated learning scenarios, ultimately leading to more robust and efficient reinforcement learning agents.

Papers