Return Conditioned Supervised Learning

Return-conditioned supervised learning (RCSL) focuses on training models that predict actions or outputs conditioned on desired future outcomes, such as high rewards or specific target values. Current research explores various model architectures, including diffusion models and decision transformers, and investigates techniques to improve robustness and efficiency, addressing challenges like hyperparameter sensitivity and the impact of environmental stochasticity. This approach holds significant promise for advancing offline reinforcement learning, improving the efficiency of conditional generation tasks in diverse fields, and enabling more controllable and aligned AI systems.

Papers