Average-Reward Maximum Entropy Reinforcement Learning for Underactuated Double Pendulum Tasks [2409.08938]