Slide 1 Reinforcement learning Regular MDP –Given: Transition model P(s’ | s, a) Reward function R(s) –Find: Policy (s) Reinforcement learning –Transition model…