Top Banner
Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]>
17

Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

Aug 20, 2018

Download

Documents

buiminh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

Lecture 5: Windy Frozen LakeNondeterministic world!

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>

Page 2: Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

S

Windy Frozen Lake

Page 3: Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

Deterministic VS Stochastic (nondeterministic)

• In deterministic models the output of the model is fully determined by the parameter values and the initial conditions initial conditions

• Stochastic models possess some inherent randomness. - The same set of parameter values and initial conditions will lead to an

ensemble of different outputs.

Page 4: Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

Deterministic

Page 5: Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

Stochastic (non-deterministic)

Page 6: Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

Stochastic (non-deterministic) worlds

• Unfortunately, our Q-learning (for deterministic worlds) does not work anymore

• Why not?

Page 7: Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

Our previous Q-learning does not work

Score over time: 0.0165

Page 8: Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

Why does not work in stochastic (non-deterministic) worlds?

a

s

Page 9: Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

Stochastic (non-deterministic) world

• Solution?- Listen to Q (s`) (just a little bit)

- Update Q(s) little bit (learning rate)

• Like our life mentors- Don’t just listen and follow one mentor

- Need to listen from many mentors

Page 10: Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

http://m.kauppalehti.fi/uutiset/your-career-needs-many-mentors--not-just-one/gp3Q4rTp

Page 11: Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

Stochastic (non-deterministic) world

a

s

Page 12: Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

Learning incrementally

• Learning rate, -

Q(s, a) r + �max

a0Q(s0, a0)

↵ = 0.1↵

Q(s, a) Q(s, a) + [r + �max

a0Q(s0, a0)]

Page 13: Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

Learning with learning rate

Q(s, a) (1� ↵)Q(s, a) + ↵[r + �max

a0Q(s0, a0)]

Q(s, a) r + �max

a0Q(s0, a0)

Page 14: Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

Learning with learning rate

Q(s, a) (1� ↵)Q(s, a) + ↵[r + �max

a0Q(s0, a0)]

Q(s, a) Q(s, a) + ↵[r + �max

a0Q(s0, a0)�Q(s, a)]

Q(s, a) r + �max

a0Q(s0, a0)

Page 15: Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

Q-learning algorithm

Q(s, a) (1� ↵)Q(s, a) + ↵[r + �max

a0Q(s0, a0)]

Page 16: Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

Convergence

Machine Learning, Tom Mitchell, 1997

ˆQ(s, a) (1� ↵) ˆQ(s, a) + ↵[r + �max

a0ˆQ(s0, a0)]

Page 17: Lecture 5: Windy Frozen Lake Nondeterministic world!hunkim.github.io/ml/RL/rl05.pdf · Lecture 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI

Next

Lab: Stochastic worlds