Top Banner
Neural Network with Memory Hung-yi Lee
43

Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Jun 25, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Neural Network with Memory

Hung-yi Lee

Page 2: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Memory is important

47

47

11

1 2 3

Input:2 dimensions

Output:1 dimension

𝑥1 𝑥2 𝑥3

𝑦1 𝑦2 𝑦3

1 4 4

1 7 7

1

+

Network needs memory to achieve this

11

23

Page 3: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Memory is important

x1 x2

y

c1

10

10

Network with Memory 1-10

1 1 1 1

1

1

MemoryCell4

747

11

1 2 3

𝑥1 𝑥2 𝑥3

𝑦1 𝑦2 𝑦3

0

4 7

1111

111

1

1

Page 4: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Memory is important

x1 x2

y

c1

10

10

Network with Memory 1-10

1 1 1 1

1

1

MemoryCell4

747

11

1 2 3

𝑥1 𝑥2 𝑥3

𝑦1 𝑦2 𝑦3

0

4 7

1212

121

2

11

Page 5: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Memory is important

x1 x2

y

c1

10

10

Network with Memory 1-10

1 1 1 1

1

1

MemoryCell4

747

11

1 2 3

𝑥1 𝑥2 𝑥3

𝑦1 𝑦2 𝑦3

0

1 1

33

30

3

110

Page 6: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Outline

Long Short-term Memory (LSTM)

Variants of RNN

Vanilla Recurrent Neural Network (RNN)

Page 7: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Outline

Long Short-term Memory (LSTM)

Variants of RNN

Vanilla Recurrent Neural Network (RNN)

Page 8: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Application

• (Simplified) Speech Recognition

TSI TSI TSI I I N N N

x1 x2 x3 x4

y1 y2 y3 y4 ……

……

S S @ @ @ @

x1 x2 x3 x4

y1 y2 y3 y4 ……

……

Utterance 1 Utterance 2

We use DNN. All the frames are considered independently.

Page 9: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

RNN input: 𝑥1 𝑥2 𝑥3 …… 𝑥𝑁

RNN

x1

y1=softmax(Woa1)

Input of RNN is one utterance

WiWh

Wo

a1=σ(Wix1+Wh0)memory

copy

The order cannot change.

y1

a10

Page 10: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

RNN input: 𝑥1 𝑥2 𝑥3 …… 𝑥𝑁

RNN

x2

y2=softmax(Woa2)

WiWh

Wo

a2=σ(Wix2+Wha1)memory

copy

y1 y2

a10 a2

Input of RNN is one utterance

The order cannot change.

Page 11: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

RNN input: 𝑥1 𝑥2 𝑥3 …… 𝑥𝑁

RNN

x3

y3=softmax(Woa3)

WiWh

Wo

a3=σ(Wix3+Wha2)memory

copy

y1 y2

a10 a2

y3

a3

Input of RNN is one utterance

The order cannot change.

Page 12: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

RNN

copy copy

x1 x2 x3

y1 y2 y3

Output yi depends on x1, x2, …… xi

Wi

Wo

Wh Wh WhWi Wi

Wo Wo

0a1

a1 a2

a2 a3

The same network is used again and again.

Input data: 𝑥1 𝑥2 𝑥3 …… 𝑥𝑁 Input of RNN is one utterance

init

Page 13: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

RNN

Output yi depends on x1, x2, …… xi

The same network is used again and again.

x1 x2 x3

y1 y2 y3

Wi

WhWo

……Wh Wh

Wi

Wo

Wi

Wo

0

Input data: 𝑥1 𝑥2 𝑥3 …… 𝑥𝑁 Input of RNN is one utterance

init

Page 14: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Cost

RNN input: 𝑥1 𝑥2 𝑥3 …… 𝑥𝑁

RNN output: 𝑦1 𝑦2 𝑦3 …… 𝑦𝑁

RNN output: 𝑦1 𝑦2 𝑦3 …… 𝑦𝑁

𝐶 =1

2

𝑛=1

𝑁

𝑦𝑛 − 𝑦𝑛 2 𝐶 =1

2

𝑛=1

𝑁

−𝑙𝑜𝑔𝑦𝑟𝑛𝑛

Page 15: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Training

RNN Training is very difficult in practice.

Backpropagation through time (BPTT)

𝑦1 𝑦2 𝑦3

x1 x2 x3

y1 y2 y3

Wi

WhWo

init……

Wh Wh

Wi

Wo

Wi

Wo

0

𝑤 is an element in Wh, Wi or Wo 𝑤 ← 𝑤 − 𝜂𝜕𝐶 ∕ 𝜕𝑤

Page 16: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

More Applications

• Input and output are vector sequences with the same length

POS Tagging

John saw the saw.

PN V D N

x1 x2 x3 x4

y1 y2 y3 y4

x1 x2 x3

y1 y2 y3

Page 17: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

More Applications

• Name entity recognition• Identifying names of people, places, organizations, etc.

from a sentence

• Harry Potter is a student of Hogwarts and lived on Privet Drive.

• people, organizations, places, not a name entity

• Information extraction• Extract pieces of information relevant to a specific

application, e.g. flight booking

• I would like to leave Boston on November 2nd and arrive in Taipei before 2 p.m.

• place of departure, destination, time of departure, time of arrival, other

Page 18: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Outline

Long Short-term Memory (LSTM)

Variants of RNN

Vanilla Recurrent Neural Network (RNN)

Page 19: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Elman Network & Jordan Network

xt xt+1

yt yt+1

……Wh

Wi

Wo

Wi

Wo

……

xt xt+1

yt yt+1

……

Wh

Wi

Wo

Wi

Wo

……

Elman Network Jordan Network

Page 20: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Deep RNN

…… ……

xt xt+1 xt+2

……

……

yt

……

……

yt+1…

…yt+2

……

……

Page 21: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Bidirectional RNN

yt+1

…… ……

…………yt+2yt

xt xt+1 xt+2

xt xt+1 xt+2

Page 22: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Many to one

• Input is a vector sequence, but output is only one vector

Sentiment Analysis

……

我 覺 太得 糟 了

超好雷

好雷

普雷

負雷

超負雷

看了這部電影覺得很高興 …….

這部電影太糟了…….

這部電影很棒 …….

Positive (正雷) Negative (負雷) Positive (正雷)

……

Page 23: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Many to Many (Output is shorter)

• Both input and output are vector sequences, but the output is shorter.

Speech Recognition

x1 x2 x3 x4 ……

好 好 好

Trimming

棒 棒 棒 棒 棒

“好棒”

You can never recognize “好棒棒” !

Page 24: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Many to Many (Output is shorter)

• Both input and output are vector sequences, but the output is shorter.

• Connectionist Temporal Classification (CTC)

• Add an extra symbol “φ” (同上)

好 φ φ 棒 φ φ φ φ

好 φ φ 棒 φ 棒 φ φ

“好棒”

“好棒棒”

Page 25: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

learnin

g

Many to Many (No Limitation)

• Both input and output are vector sequences with different lengths. → Sequence to sequence learning

Machine Translation

mach

ine

機 器 學 習

……

……

never-ending

慣 性

Page 26: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Many to Many (No Limitation)

•推文接龍• Ref: http://pttpedia.pixnet.net/blog/post/168133002-

%E6%8E%A5%E9%BE%8D%E6%8E%A8%E6%96%87

推xxx: ptt萬歲推dd: 歲平安噓dddf: 全推zzzzzzzzzzz: 家就是你家

……

推tlkagk: =========斷==========

Page 27: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Many to Many (No Limitation)

• Both input and output are vector sequences with different lengths. → Sequence to sequence learning

Add a symbol “===“ (斷)learn

ing

mach

ine

機 器 學 習

===

Page 28: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

One to Many

• Input is one vector, but output is a vector sequence

Caption generationInput

a woman is throwing a

……

…… ===(斷)

Page 29: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Outline

Long Short-term Memory (LSTM)

Variants of RNN

Vanilla Recurrent Neural Network (RNN)

Page 30: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

MemoryCell

Long Short-term Memory (LSTM)

Input Gate

Output Gate

Signal control the input gate

Signal control the output gate

Forget Gate

Signal control the forget gate

Other part of the network

Other part of the network

(Other part of the network)

(Other part of the network)

(Other part of the network)

LSTM

Special Neuron:4 inputs, 1 output

Page 31: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

𝑧

𝑧𝑖

𝑧𝑓

𝑧𝑜

𝑔 𝑧

𝑓 𝑧𝑖multiply

multiplyActivation function f is usually a sigmoid function

Between 0 and 1

Mimic open and close gate

c

𝑐′ = 𝑔 𝑧 𝑓 𝑧𝑖 + 𝑐𝑓 𝑧𝑓

ℎ 𝑐′𝑓 𝑧𝑜

𝑎 = ℎ 𝑐′ 𝑓 𝑧𝑜

𝑔 𝑧 𝑓 𝑧𝑖

𝑐′

𝑓 𝑧𝑓

𝑐𝑓 𝑧𝑓

𝑐

Page 32: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

x1 x2 Input

Original Network:

Simply replace the neurons with LSTM

𝑎1 𝑎2

……

……

𝑧1 𝑧2

Page 33: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

x1 x2

+

+

+

+

+

+

+

+

Input

𝑎1 𝑎2

4 times of parameters

Page 34: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

LSTM - Example

100

0

𝑥1

𝑥2

𝑥3

𝑦

310

0

200

0

410

0

200

0

101

7

3-10

0

610

0

101

6

When x2 = 1, add the numbers of x1 into the memory

When x3 = 1, output the number in the memory.

0 0 3 3 7 7 7 0 6

When x2 = -1, reset the memory

Page 35: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

𝑥1

𝑥2

𝑥3

𝑦

310

0

410

0

200

0

101

7

3-10

0

+

+

+

+

x1 x2 x3

x1

x2

x3

x1

x2

x3

x1

x2

x3

y

1 0 0

100

0

0

1

1

1

1

0

-10

0

0

100

-10

0

010

1000

Page 36: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

𝑥1

𝑥2

𝑥3

𝑦

310

0

410

0

200

0

101

7

3-10

0

+

+

+

+

3 1 0

3

1

0

3

1

0

3

1

0

y

1 0 0

100

0

0

1

1

1

1

0

-10

0

0

100

-10

0

010

100

3

≈1 3

≈103

3≈0

≈0

Page 37: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

𝑥1

𝑥2

𝑥3

𝑦

310

0

410

0

200

0

101

7

3-10

0

+

+

+

+

4 1 0

4

1

0

4

1

0

4

1

0

y

1

1

1

1

4

≈1 4

≈137

7≈0

≈0

1 0 0

100

0

0

0

-10

0

0

100

-10

0

010

100

Page 38: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

𝑥1

𝑥2

𝑥3

𝑦

310

0

410

0

200

0

101

7

3-10

0

+

+

+

+

2 0 0

2

0

0

2

0

0

2

0

0

y

1

1

1

1

2

≈0 0

≈17

7≈0

≈0

1 0 0

100

0

0

0

-10

0

0

100

-10

0

010

100

Page 39: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

𝑥1

𝑥2

𝑥3

𝑦

310

0

410

0

200

0

101

7

3-10

0

+

+

+

+

1 0 1

1

0

1

1

0

1

1

0

1

y

1

1

1

1

1

≈0 0

≈17

7≈1

≈7

1 0 0

100

0

0

0

-10

0

0

100

-10

0

010

100

Page 40: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

𝑥1

𝑥2

𝑥3

𝑦

310

0

410

0

200

0

101

7

3-10

0

+

+

+

+

3 -1 0

3

-1

0

3

-1

0

3

-1

0

y

1

1

1

1

3

≈0 0

≈07

0≈0

≈0

0

1 0 0

100

0

0

0

-10

0

0

100

-10

0

010

100

Page 41: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

What is the next wave?

• Attention-based Model

Reading Head Controller

Input x

Reading Head Writing Head

DNN

Writing Head Controller

output y

…… ……Memory

Page 42: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Recommended Reading List

• The Unreasonable Effectiveness of Recurrent Neural Networks

• http://karpathy.github.io/2015/05/21/rnn-effectiveness/

• Understanding LSTM Networks

• http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 43: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

Acknowledgement

•感謝葉軒銘同學於上課時發現投影片上的錯誤