Soft Q Learning
Soft Q-learning is a reinforcement learning algorithm that maximizes an entropy-regularized value function, balancing reward maximization with exploration of diverse actions. Current research focuses on improving its efficiency and robustness through techniques like bounding value function estimates, incorporating adversarial methods, and developing principled temperature scheduling to manage the exploration-exploitation trade-off. These advancements aim to enhance the algorithm's performance in various applications, including imitation learning, prompt tuning for large language models, and control problems with limited or noisy data, ultimately contributing to more stable and effective reinforcement learning systems.
Papers
October 17, 2024
July 20, 2024
June 26, 2024
March 11, 2024
March 2, 2024
February 20, 2024
January 30, 2024
January 26, 2024
December 21, 2023
December 18, 2023
March 1, 2023
May 30, 2022