From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information [2310.00642]