A general Markov decision process formalism for action-state entropy-regularized reward maximization [2302.01098]