Top Banner
Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning
86

Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Aug 20, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Advanced Prediction Models

Deep Learning, Graphical Models and Reinforcement Learning

Page 2: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Today’s Outline

• Complex Decisions

• Reinforcement Learning Basics

• Markov Decision Process

• (State Action) Value Function

• Q Learning Algorithm

2

Page 3: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Complex Decisions

3

Page 4: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Complex Decisions Making is Everywhere

Optimal Control/Engineering

Machine Learning/AI

Neuroscience/Psychology Economics/Operations Research

RL

Control

• Fly drones• Autonomous driving

Operations

• Retain customers, UX• Inventory management

Logistics

• Schedule transportation• Resource allocation

Games

• Chess, Go, Atari

Page 5: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Complex Decisions Making is Everywhere

Credit: Sebastien Bubeck

Control

• Fly drones

• Autonomous driving

Operations

• Retain customers, UX

• Inventory management

Logistics

• Schedule transportation

• Resource allocation

Games

• Chess, Go, Atari

Page 6: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Complex Decision Making can be addressed using RL

61Reference: technologyreview.com/s/603501/10-breakthrough-technologies-2017-reinforcement-learning/

March/April 2017 Issue

Page 7: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Playing Atari Using RL (2013)

1Figure: Defazio Graepel, Atari Learning Environment

Page 8: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

1Reference: DeepMind, March 2016

AlphaGo Conquers Go (2016)

Page 9: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

• Videos

9

Page 10: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Need for Reinforcement Learning

101Reference: https://medium.com/@awjuliani/simple-reinforcement-learning-with-tensorflow-part-1-5-contextual-bandits-bff01d1aad9c

Non-exogenous change of states/contexts

Page 11: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Questions?

11

Page 12: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Today’s Outline

• Complex Decisions

• Reinforcement Learning Basics

• Markov Decision Process

• (State Action) Value Function

• Q Learning Algorithm

12

Page 13: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

RL Overview

• Reinforcement Learning (RL) addresses a version of the problem of sequential decision making

• Ingredients:

• There is an environment

• Within which, an agent takes actions

• This action influences the future

• Agent gets a (potentially delayed) feedback signal

• How to select actions to maximize total reward?

• RL provides several sound answers to this question

Page 14: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

The Environment

• Sees Agent’s action !" and generates an observation #"$% and a reward &"$%

• Subscript ' indexes time. Current observation #" is called state

• Assume the future (at times ' + 1, ' + 2,… .) is independent of the past (… , ' − 2, ' − 1) given the present ('): this is called the Markov assumption

• Assume everything relevant is observed

1 #"$% #" = 1(#"$%|#%, #4, … #")

Page 15: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

The Agent

• Agent observes !"#$, &"#$ and these are not i.i.d. across time

• Agent’s objective is to maximize expected total future reward E[!"#$ + *!"#+ +⋯]

• Agent’s actions affect what it sees in the future (&"#$)

• Maybe better to trade off current reward !"#$ to gain more rewards in the future

*: Discount Factor

Page 16: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

The Reward

1Reference: David Silver, 2015

Page 17: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

The Goal

1Reference: David Silver, 2015

Page 18: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

The Interactions

• Pictorially

Agent Environment

!", $"

%"

Agent Environment

!"&', $"&'

%"&'

Agent Environment

!"&(, $"&(

%"&(

Page 19: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

The Interactions

• Pictorially

Agent Environment

!", $"

%"

Agent Environment

!"&', $"&'

%"&'

Agent Environment

!"&(, $"&(

%"&(

Page 20: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

The Interactions

• Pictorially

Agent Environment

!", $"

%"

Agent Environment

!"&', $"&'

%"&'

Agent Environment

!"&(, $"&(

%"&(

Page 21: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

RL versus other Machine Learning Settings

21

1Reference: David Silver, 2015

Page 22: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

RL versus other Machine Learning Settings

22

1Reference: Joelle Pineau, DLSS 2016

Page 23: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Components of an RL Agent

23

1Reference: David Silver, 2015

Page 24: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Components of RL: Policy

24

1Reference: David Silver, 2015

Page 25: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Components of RL: Policy

251Reference: David Silver, 2015

Page 26: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Components of RL: Policy

261Reference: David Silver, 2015

Page 27: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Components of RL: Value Function

27

1Reference: David Silver, 2015

Page 28: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Components of RL: Value Function

28

1Reference: David Silver, 2015

Page 29: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Components of RL: Model

291Reference: David Silver, 2015

Page 30: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Components of RL: Model

30

1Reference: David Silver, 2015

Page 31: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Questions?

31

Page 32: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Today’s Outline

• Complex Decisions

• Reinforcement Learning Basics

• Markov Decision Process

• (State Action) Value Function

• Q Learning Algorithm

32

Page 33: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Components of RL: MDP Framework

33

• We will now revisit these components formally

• Policy !(#|%)

• Value function '((%)

• Model )**+, and ℛ*

,

• In the framework of Markov Decision Processes

• And then we will address the question of optimizingfor the best ! in realistic environments

Page 34: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Towards a Markov Decision Process

• MDPs are a useful way to describe the RL problem

• MDPs can be understood via the following progression

• Start with a Markov Chain

• State transitions happen autonomously

• Add Rewards

• Becomes a Markov Reward Process

• Add Actions that influences state transitions

• Becomes a Markov Decision Process

1Reference: David Silver, 2015

Page 35: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Markov Chain/Process

1Reference: David Silver, 2015

Page 36: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Example Markov Chain

1Reference: David Silver, 2015

Page 37: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Example Markov Chain

1Reference: David Silver, 2015

Page 38: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Example Markov Chain

1Reference: David Silver, 2015

Page 39: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Markov Chain with Rewards

1Reference: David Silver, 2015

Page 40: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Markov Chain with Rewards

1Reference: David Silver, 2015

Page 41: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Markov Chain with Rewards

1Reference: David Silver, 2015

Page 42: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Example Markov Reward Process

1Reference: David Silver, 2015

Page 43: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Recursions in Markov Reward Process

1Reference: David Silver, 2015

Page 44: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Recursions in Markov Reward Process

1Reference: David Silver, 2015

Page 45: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

1Reference: David Silver, 2015

Markov Decision Process

Page 46: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Example Markov Decision Process

1Reference: David Silver, 2015

Page 47: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Markov Decision Process: Policy

• Now that we have introduced actions, we can discuss policies again

• Recall

1Reference: David Silver, 2015

Page 48: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

MDP is an MRP for a Fixed Policy

1Reference: David Silver, 2015

Page 49: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

MDP is an MRP for a Fixed Policy

1Reference: David Silver, 2015

Page 50: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Markov Decision Process: Value Function• We can also talk about the value function(s)

1Reference: David Silver, 2015

Page 51: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Markov Decision Process: Value Function• We can also talk about the value function(s)

1Reference: David Silver, 2015

Page 52: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Recursions in MDP

1Reference: David Silver, 2015

*Also called the Bellman Expectation Equations

Page 53: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Recursions in MDP

1Reference: David Silver, 2015

*Also called the Bellman Expectation Equations

Page 54: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Markov Decision Process: Objective

1Reference: David Silver, 2015

Page 55: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Markov Decision Process: Objective

1Reference: David Silver, 2015

*Also called the Bellman Optimality Equation

Page 56: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Markov Decision Process: Optimal Policy

1Reference: David Silver, 2015

Page 57: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Questions?

57

Page 58: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Today’s Outline

• Complex Decisions

• Reinforcement Learning Basics

• Markov Decision Process

• (State Action) Value Function

• Q Learning Algorithm

58

Page 59: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Finding the Best Policy

• Need to be able to do two things ideally

• Prediction:

• For a given policy, evaluate how good it is

• Compute !"($, &)

• Control:

• And make an improvement from (

• We will focus on the Q Learning algorithm

• It does prediction and control ‘simultaneously’

1Reference: David Silver, 2015

Page 60: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Intuition for an Iterative Algorithm

1Reference: David Silver, 2015

Page 61: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Intuition for an Iterative Algorithm

1Reference: David Silver, 2015

Page 62: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

The Q Learning Algorithm

• If we know the model

• Turn the Bellman Optimality Equation into an iterative update

• This is called Value Iteration

1Reference: David Silver, 2015

Page 63: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

The Q Learning Algorithm

• If we do not know the model

• Do sampling to get an incremental iterative update

• Choose next actions to ensure exploration

1Reference: David Silver, 2015

Page 64: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

The Q Learning Algorithm

• If we do not know the model

• Do sampling to get an incremental iterative update

• Choose next actions to ensure exploration

1Reference: David Silver, 2015

Page 65: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

The Q Learning Algorithm

• If we do not know the model

• Do sampling to get an incremental iterative update

• Choose next actions to ensure exploration

1Reference: David Silver, 2015

Page 66: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

The Q Learning Algorithm

• If we do not know the model

• Do sampling to get an incremental iterative update

• Choose next actions to ensure exploration

1Reference: David Silver, 2015

Page 67: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

The Q Learning Algorithm

• Initialize !, which is a table of size #states×#actions

• Start at state #$

• For % = 1,2,3, … .

• Take -. chosen uniformly at random with probability /

• Take argmax5∈7 !(9., :) with probability 1 − /

• Update Q: • ! 9., -. = ! 9., -. + >.(?.@$ + Amax

5∈7! 9.@$, : − !(9., -.))

• Parameter / is the exploration parameter

• Parameter >. is the learning rate

• Under appropriate assumptions1, lim.→E

! = !∗

Temporal difference error

1Reference: Christopher J. C. H. Watkins and Peter Dayan, 1992

Explore

Exploit

Page 68: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

The Q Learning Algorithm

• Initialize !, which is a table of size #states×#actions

• Start at state #$

• For % = 1,2,3, … .

• Take -. chosen uniformly at random with probability /

• Take argmax5∈7 !(9., :) with probability 1 − /

• Update Q: • ! 9., -. = ! 9., -. + >.(?.@$ + Amax

5∈7! 9.@$, : − !(9., -.))

• Parameter / is the exploration parameter

• Parameter >. is the learning rate

• Under appropriate assumptions1, lim.→E

! = !∗

Temporal difference error

1Reference: Christopher J. C. H. Watkins and Peter Dayan, 1992

Explore

Exploit

Page 69: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

The Q Learning Algorithm

• Initialize !, which is a table of size #states×#actions

• Start at state #$

• For % = 1,2,3, … .

• Take -. chosen uniformly at random with probability /

• Take argmax5∈7 !(9., :) with probability 1 − /

• Update Q: • ! 9., -. = ! 9., -. + >.(?.@$ + Amax

5∈7! 9.@$, : − !(9., -.))

• Parameter / is the exploration parameter

• Parameter >. is the learning rate

• Under appropriate assumptions1, lim.→E

! = !∗

Temporal difference error

1Reference: Christopher J. C. H. Watkins and Peter Dayan, 1992

Explore

Exploit

Page 70: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

The Q Learning Algorithm

• Initialize !, which is a table of size #states×#actions

• Start at state #$

• For % = 1,2,3, … .

• Take -. chosen uniformly at random with probability /

• Take argmax5∈7 !(9., :) with probability 1 − /

• Update Q: • ! 9., -. = ! 9., -. + >.(?.@$ + Amax

5∈7! 9.@$, : − !(9., -.))

• Parameter / is the exploration parameter

• Parameter >. is the learning rate

• Under appropriate assumptions1, lim.→E

! = !∗

Temporal difference error

1Reference: Christopher J. C. H. Watkins and Peter Dayan, 1992

Explore

Exploit

Page 71: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

The Q Learning Algorithm

• Initialize !, which is a table of size #states×#actions

• Start at state #$

• For % = 1,2,3, … .

• Take -. chosen uniformly at random with probability /

• Take argmax5∈7 !(9., :) with probability 1 − /

• Update Q: • ! 9., -. = ! 9., -. + >.(?.@$ + Amax

5∈7! 9.@$, : − !(9., -.))

• Parameter / is the exploration parameter

• Parameter >. is the learning rate

• Under appropriate assumptions1, lim.→E

! = !∗

Temporal difference error

1Reference: Christopher J. C. H. Watkins and Peter Dayan, 1992

Explore

Exploit

Page 72: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

The Q Learning Algorithm

• Initialize !, which is a table of size #states×#actions

• Start at state #$

• For % = 1,2,3, … .

• Take -. chosen uniformly at random with probability /

• Take argmax5∈7 !(9., :) with probability 1 − /

• Update Q: • ! 9., -. = ! 9., -. + >.(?.@$ + Amax

5∈7! 9.@$, : − !(9., -.))

• Parameter / is the exploration parameter

• Parameter >. is the learning rate

• Under appropriate assumptions1, lim.→E

! = !∗

Temporal difference error

1Reference: Christopher J. C. H. Watkins and Peter Dayan, 1992

Explore

Exploit

Page 73: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

The Q Learning Algorithm: Recap

• Bellman Optimality Equation gives rise to the Q-Value Iteration algorithm

• Making this algorithm incremental, sampled and adding !-greedy exploration gives Q Learning Algorithm

73

1Reference: David Silver, 2015

Page 74: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Questions?

74

Page 75: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Summary

• RL is a great framework to make agents intelligent

• Specify goals and provide feedback

• Many challenges still remain: exciting opportunity to contribute towards next generation of artificially intelligent and autonomous agents.

• In the next lecture, we will see that deep learning function approximation based RL agents show promise in large complex tasks: representations matter!• Applications such as • Self-driving cars• Intelligent virtual agents

75

Page 76: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Appendix

76

Page 77: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Sample Exam Questions

• What is the difference between a Markov Chain and a Markov Reward Process?

• What is the difference between a Markov Chain and a Markov Decision Process?

• Why is exploration needed in the reinforcement learning setting?

• What does the optimal state-action value function signify?

• What are the two objects (distributions) of an RL model?

• What is the difference between supervised learning and reinforcement learning?

77

Page 78: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Additional Resources

• An Introduction to Reinforcement Learning by Richard Sutton and Andrew Barto• http://incompleteideas.net/sutton/book/the-book.html

• Course on Reinforcement Learning by David Silver at UCL (includes video lectures)• http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html

• Research Papers• Deep RL collection: https://github.com/junhyukoh/deep-

reinforcement-learning-papers• [MKSRVBGRFOPBSAKKWLH2015] Mnih et al. Human-level

control through deep reinforcement learning. Nature, 518:529–533, 2015.

• [SHMGSDSAPLDGNKSLLKGH2016] Silver et al. Mastering the game of Go with deep neural networks and tree search. Nature, 529: 484–489, 2016.

78

Page 79: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Cons of RL

• Reinforcement Learning requires experiencing the environment many many times

• This is because it is a trial and error based approach

• Impractical for many complex tasks

• Unless one has access to simulators where an RL agent can practice a billon times

79

Page 80: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

RL versus other Machine Learning Settings

80

• There is a notion of exploration and exploitation, similar to Multi-armed bandits and Contextual bandits

• Key difference: actions influence future contexts

1Reference: David Silver, 2015

Page 81: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

RL versus other Sequential Decision Making Settings

81

1Reference: David Silver, 2015

Page 82: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Types of RL Agents

82

1Reference: David Silver, 2015

• There are many ways to design them, so we roughly categorize then as below:

Page 83: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Relating the Two Value Functions I

1Reference: David Silver, 2015

Page 84: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Relating the Two Value Functions II

1Reference: David Silver, 2015

Page 85: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Recursion in MDP: Value Function Version

1Reference: David Silver, 2015

Page 86: Advanced Prediction Models · Advanced Prediction Models Deep Learning, Graphical Models and Reinforcement Learning. Today’s Outline • Complex Decisions • Reinforcement Learning

Relating Policy and Value Function

1Reference: David Silver, 2015