Paper ID: 2410.16739

Corrected Soft Actor Critic for Continuous Control

Yanjun Chen, Xinming Zhang, Xianghui Wang, Zhiqiang Xu, Xiaoyu Shen, Wei Zhang

The Soft Actor-Critic (SAC) algorithm is known for its stability and high sample efficiency in deep reinforcement learning. However, the tanh transformation applied to sampled actions in SAC distorts the action distribution, hindering the selection of the most probable actions. This paper presents a novel action sampling method that directly identifies and selects the most probable actions within the transformed distribution, thereby addressing this issue. Extensive experiments on standard continuous control benchmarks demonstrate that the proposed method significantly enhances SAC's performance, resulting in faster convergence and higher cumulative rewards compared to the original algorithm.

Submitted: Oct 22, 2024

Topics

Deep Reinforcement Learning
Continuous Control
Faster Convergence
Soft Actor Critic
Action Sampling
Soft Actor

Links

arXiv PDF