Soft Actor-Critic with Beta Policy via Implicit Reparameterization Gradients [2409.04971]