Lecture 7: DQN - GitHub Pages · PDF fileLecture 7: DQN Reinforcement Learning with TensorFlow&OpenAI Gym ... Deep Q-Networks ... Deep Reinforcement Learning, David Silver,

Lecture 7: DQN

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <hunkim+ml@gmail.com>

Q-function Approximation: Q-Nets

(1) state, s

(2) quality (reward)for all actions(eg, [0.5, 0.1, 0.0, 0.8] LEFT: 0.5, RIGHT 0.1 UP: 0.0, DOWN: 0.8)

111 11111

Q-Nets are unstable

Convergence

Tutorial: Deep Reinforcement Learning, David Silver, Google DeepMind

ˆQ(st, at|✓)� (rt + �max

a0ˆQ(st+1, a

0|✓))]2

Reinforcement + Neural Net

http://stackoverflow.com/questions/10722064/training-a-neural-network-with-reinforcement-learning

NATURE.COM/NATURE26 February 2015 £10

Vol. 518, No. 7540

EPIDEMIOLOGY

SHARE DATA IN OUTBREAKS

Forge open access to sequences and more

COSMOLOGY

A GIANT IN THE EARLY UNIVERSE

A supermassive black hole at a redshift of 6.3

PAGES 490 & 512

QUANTUM PHYSICS

TELEPORTATION FOR TWO

Transferring two properties of a single photon

PAGES 491 & 516

INNOVATIONS INThe microbiome

Self-taught AI software attains human-level

performance in video games PAGES 486 & 529

T H E I N T E R N AT I O N A L W E E K LY J O U R N A L O F S C I E N C E

Two big issues

1. Correlations between samples

Playing Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al.

Prerequisite: http://hunkim.github.io/ml/ orhttps://www.inflearn.com/course/기본적인-머신러닝-딥러닝-강좌/

2. Non-stationary targets

a0ˆQ(st+1, a

0|✓))]2

2. Non-stationary targets

a0ˆQ(st+1, a

0|✓))]2

Y = Q(st, at|✓) Y = rt + �max

a0ˆQ✓(st+1, a

0|✓)

DQN’s three solutions

1. Go deep

2. Capture and replay • Correlations between samples

3. Separate networks: create a target network • Non-stationary targets

Human-level control through deep reinforcement learning, Nature http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html

Solution 1: go deep

ICML 2016 Tutorial: Deep Reinforcement Learning, David Silver, Google DeepMind

Problem 2: correlations between samples

Solution 2: experience replayDeep Q-Networks (DQN): Experience Replay

To remove correlations, build data-set from agent’s own experience

s1, a1, r2, s2s2, a2, r3, s3 ! s, a, r , s 0

s3, a3, r4, s4...

, rt+1, st+1 ! s

, rt+1, st+1

Sample experiences from data-set and apply update

✓r + � max

0Q(s 0, a0,w�) � Q(s, a,w)

To deal with non-stationarity, target parameters w� are held fixed

Capturerandom sample

& Replaymin

a0ˆQ(st+1, a

0|✓))]2

ICML 2016 Tutorial: Deep Reinforcement Learning, David Silver, Google DeepMind

Solution 2: experience replay

Problem 2: correlations between samplesDeep Q-Networks (DQN): Experience Replay

To remove correlations, build data-set from agent’s own experience

s1, a1, r2, s2s2, a2, r3, s3 ! s, a, r , s 0

s3, a3, r4, s4...

, rt+1, st+1 ! s

, rt+1, st+1

Sample experiences from data-set and apply update

✓r + � max

0Q(s 0, a0,w�) � Q(s, a,w)

To deal with non-stationarity, target parameters w� are held fixed

Problem 3: non-stationary targets

a0ˆQ(st+1, a

0|✓))]2

Y = Q(st, at|✓) Y = rt + �max

a0ˆQ✓(st+1, a

0|✓)

Solution 3: separate target network

a0ˆQ(st+1, a

0|✓))]2

a0ˆQ(st+1, a

0|¯✓))]2

Solution 3: separate target network

a0ˆQ(st+1, a

0|¯✓))]2

(2)Ws(1)s

(1)s (2) Y (target)

Solution 3: copy network

(2)Ws(1)s

(1)s (2) Y (target)

Understanding Nature

Paper (2015)

Lab: DQN

Lecture 7: DQN - GitHub Pages · PDF fileLecture 7: DQN Reinforcement Learning with TensorFlow&OpenAI Gym ... Deep Q-Networks ... Deep Reinforcement Learning, David Silver,

Documents

Deep Learning and Reinforcement Learning

Averaged-DQN: Variance Reduction and Stabilization …...

Deep Reinforcement Learning from Human...

Thinking While Moving: Deep Reinforcement Learning in ......

Human-level control through deep reinforcement...

Human-level control through deep reinforcement … control.....

IntelliLight: A Reinforcement Learning Approach for ... ·....

Asynchronous Methods for Deep Reinforcement Learning ·...

Ensemble Network Architecture for Deep Reinforcement...

Reinforcement and deep reinforcement learning for wireless.....

Reinforcement Learning: A Brief Introduction · 2012:...

ROBUST FOREX TRADING WITH DEEP Q NETWORK (DQN)

Deep Reinforcement Learning and...

Hangzhou Deep Learning Meetup-Deep Reinforcement Learning

Hierarchical Deep Reinforcement Learning: Integrating...

Lecture 7: DQN - GitHub...