Top Banner
Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland LSTM EXAMPLE Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 1 / 49
126

Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

May 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Long Short Term Memory Networks

Fenfei Guo and Jordan Boyd-GraberUniversity of MarylandLSTM EXAMPLE

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 1 / 49

Page 2: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Recap of LSTM

Three gates: input (it ), forget (ft ),out (ot )

it =σ(Wiixt +bii +Whiht−1 +bhi)

ft =σ(Wif xt +bif +Whf ht−1 +bhf )

ot =σ(Wioxt +bio +Whoht−1 +bho)

New memory input: c̃t

c̃t = tanh(Wicxt+bic+Whcht−1+bhc)

Memorize and forget:

ct = ft ∗ ct−1 + it ∗ c̃t

ht = ot ∗ tanh(ct)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 2 / 49

Page 3: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Figuring out this LSTM

A

1.0 0.0

B

0.0 1.0

� input sequence: A, A, B, B, A, B, A

x1 = [1.0,0.0] x2 = [1.0,0.0] x3 = [0.0,1.0] . . .

� prediction output:

yt = softmax(ht) [number of hidden nodes= 2]

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 3 / 49

Page 4: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Parameters that take xt as input

Input Gate

Wii =

30.00 0.000.00 0.00

bii =

0.000.00

Forget Gate

Wif =

0.00 0.000.00 0.00

bif =

0.000.00

Memory Cell

Wic =

30.00 0.000.00 30.00

bic =

0.000.00

Output Gate

Wio =

0.00 0.000.00 0.00

bio =

30.0030.00

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 4 / 49

Page 5: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Parameters that take ht−1 as input

Input Gate

Whi =

0.00 0.0060.00 0.00

bhi =

0.00−30.00

Forget Gate

Whf =

0.00 0.000.00 −30.00

bhf =

−30.000.00

Memory Cell

Whc =

0.00 0.000.00 0.00

bhc =

0.000.00

Output Gate

Who =

0.00 0.000.00 0.00

bho =

0.000.00

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 5 / 49

Page 6: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Inputs

� Initial hidden states:h0 = [0.0,0.0]>

� Initial memory input:c0 = [0.0,0.0]>

� Input sequences in time: A, A, B, B, A, B, A

x1 =

1.00.0

x2 =

1.00.0

x3 =

0.01.0

. . .

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 6 / 49

Page 7: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Input Gate at t = 1: i1

Wii =

30.00 0.000.00 0.00

bii =

0.000.00

Whi =

0.00 0.0060.00 0.00

bhi =

0.00−30.00

x(1) = [1.00,0.00]> h(0) = [0.00,0.00]>

i(1) =σ(Wiix(1)+bii +Whih

(0)+bhi) (1)

=σ([30.00,−30.00]>) (2)

= [1.00,0.00]> (3)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 7 / 49

Page 8: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Input Gate at t = 1: i1

Wii =

30.00 0.000.00 0.00

bii =

0.000.00

Whi =

0.00 0.0060.00 0.00

bhi =

0.00−30.00

x(1) = [1.00,0.00]> h(0) = [0.00,0.00]>

i(1) =σ(Wiix(1)+bii +Whih

(0)+bhi) (1)

=σ([30.00,−30.00]>) (2)

= [1.00,0.00]> (3)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 7 / 49

Page 9: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forget Gate at t = 1: f (1)

Wif =

0.00 0.000.00 0.00

bif =

0.000.00

Whf =

0.00 0.000.00 −30.00

bhf =

−30.000.00

x(1) = [1.00,0.00]> h(0) = [0.00,0.00]>

f (1) =σ(Wif x(1)+bif +Whf h

(0)+bhf ) (4)

=σ([−30.00,0.00]>) (5)

= [0.00,0.50]> (6)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 8 / 49

Page 10: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forget Gate at t = 1: f (1)

Wif =

0.00 0.000.00 0.00

bif =

0.000.00

Whf =

0.00 0.000.00 −30.00

bhf =

−30.000.00

x(1) = [1.00,0.00]> h(0) = [0.00,0.00]>

f (1) =σ(Wif x(1)+bif +Whf h

(0)+bhf ) (4)

=σ([−30.00,0.00]>) (5)

= [0.00,0.50]> (6)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 8 / 49

Page 11: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Output Gate at t = 1: o(1)

Wio =

0.00 0.000.00 0.00

bio =

30.0030.00

Who =

0.00 0.000.00 0.00

bho =

0.000.00

x(1) = [1.00,0.00]> h(0) = [0.00,0.00]>

o(1) =σ(Wiox(1)+bio +Whoh(0)+bho) (7)

=σ([30.00,30.00]>) (8)

= [1.00,1.00]> (9)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 9 / 49

Page 12: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Output Gate at t = 1: o(1)

Wio =

0.00 0.000.00 0.00

bio =

30.0030.00

Who =

0.00 0.000.00 0.00

bho =

0.000.00

x(1) = [1.00,0.00]> h(0) = [0.00,0.00]>

o(1) =σ(Wiox(1)+bio +Whoh(0)+bho) (7)

=σ([30.00,30.00]>) (8)

= [1.00,1.00]> (9)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 9 / 49

Page 13: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Memory Contribution at t = 1: c̃(1)

Wi c̃ =

30.00 0.000.00 30.00

bi c̃ =

0.000.00

Whc̃ =

0.00 0.000.00 0.00

bhc̃ =

0.000.00

x(1) = [1.00,0.00]> h(0) = [0.00,0.00]>

c̃(1) = tanh(Wi c̃x(1)+bi c̃ +Whc̃h(0)+bhc̃) (10)

= tanh([30.00,0.00]>) (11)

= [1.00,0.00]> (12)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 10 / 49

Page 14: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Memory Contribution at t = 1: c̃(1)

Wi c̃ =

30.00 0.000.00 30.00

bi c̃ =

0.000.00

Whc̃ =

0.00 0.000.00 0.00

bhc̃ =

0.000.00

x(1) = [1.00,0.00]> h(0) = [0.00,0.00]>

c̃(1) = tanh(Wi c̃x(1)+bi c̃ +Whc̃h(0)+bhc̃) (10)

= tanh([30.00,0.00]>) (11)

= [1.00,0.00]> (12)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 10 / 49

Page 15: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 1

f1

[0.00,0.50]>

c0

[0.00,0.00]>i1

[1.00,0.00]>

c̃1

[1.00,0.00]>

� Message forward (c1)

c1 =f1 ◦ c0 + i1 ◦ c̃1 (13)

(14)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 11 / 49

Page 16: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 1

f1

[0.00,0.50]>

c0

[0.00,0.00]>i1

[1.00,0.00]>

c̃1

[1.00,0.00]>

� Message forward (c1)

c1 =f1 ◦ c0 + i1 ◦ c̃1 (13)

=[0.00,0.50]> ◦ [0.00,0.00]>+[1.00,0.00]> ◦ [1.00,0.00]> (14)

(15)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 11 / 49

Page 17: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 1

f1

[0.00,0.50]>

c0

[0.00,0.00]>i1

[1.00,0.00]>

c̃1

[1.00,0.00]>

� Message forward (c1)

c1 =f1 ◦ c0 + i1 ◦ c̃1 (13)

=[0.00,0.50]> ◦ [0.00,0.00]>+[1.00,0.00]> ◦ [1.00,0.00]> (14)

=[1.00,0.00]> (15)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 11 / 49

Page 18: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 1

f1

[0.00,0.50]>

c0

[0.00,0.00]>i1

[1.00,0.00]>

c̃1

[1.00,0.00]>

� Message forward (c1)

c1 =[1.00,0.00]> (13)

� New hidden (h1)

h1 (14)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 11 / 49

Page 19: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 1

f1

[0.00,0.50]>

c0

[0.00,0.00]>i1

[1.00,0.00]>

c̃1

[1.00,0.00]>

� Message forward (c1)

c1 =[1.00,0.00]> (13)

� New hidden (h1)

h1 =o1 ◦ tanh(c1) (14)

(15)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 11 / 49

Page 20: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 1

f1

[0.00,0.50]>

c0

[0.00,0.00]>i1

[1.00,0.00]>

c̃1

[1.00,0.00]>

� Message forward (c1)

c1 =[1.00,0.00]> (13)

� New hidden (h1)

h1 =o1 ◦ tanh(c1) (14)

=[1.00,1.00]> ◦ tanh([1.00,0.00]>) (15)

(16)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 11 / 49

Page 21: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 1

f1

[0.00,0.50]>

c0

[0.00,0.00]>i1

[1.00,0.00]>

c̃1

[1.00,0.00]>

� Message forward (c1)

c1 =[1.00,0.00]> (13)

� New hidden (h1)

h1 =o1 ◦ tanh(c1) (14)

=[1.00,1.00]> ◦ tanh([1.00,0.00]>) (15)

=[0.76,0.00]> (16)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 11 / 49

Page 22: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 1

f1

[0.00,0.50]>

c0

[0.00,0.00]>i1

[1.00,0.00]>

c̃1

[1.00,0.00]>

� Message forward (c1)

c1 =[1.00,0.00]> (13)

� New hidden (h1)

h1 =[0.76,0.00]> (14)

� Prediction y1 = softmax(h1) = 0

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 11 / 49

Page 23: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Summary at t = 1

σ σ Tanh σ

× +

× ×

Tanh

c⟨t−1⟩

0.00,0.00

h⟨t−1⟩

0.00,0.00

x ⟨t⟩1.00,0.00

c⟨t⟩

1.00,0.00

h⟨t⟩

0.76,0.00

h⟨t⟩0.76,0.00

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 12 / 49

Page 24: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Input Gate at t = 2: i1

Wii =

30.00 0.000.00 0.00

bii =

0.000.00

Whi =

0.00 0.0060.00 0.00

bhi =

0.00−30.00

x(2) = [1.00,0.00]> h(1) = [0.76,0.00]>

i(2) =σ(Wiix(2)+bii +Whih

(1)+bhi) (15)

=σ([30.00,15.70]>) (16)

= [1.00,1.00]> (17)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 13 / 49

Page 25: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Input Gate at t = 2: i1

Wii =

30.00 0.000.00 0.00

bii =

0.000.00

Whi =

0.00 0.0060.00 0.00

bhi =

0.00−30.00

x(2) = [1.00,0.00]> h(1) = [0.76,0.00]>

i(2) =σ(Wiix(2)+bii +Whih

(1)+bhi) (15)

=σ([30.00,15.70]>) (16)

= [1.00,1.00]> (17)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 13 / 49

Page 26: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forget Gate at t = 2: f (2)

Wif =

0.00 0.000.00 0.00

bif =

0.000.00

Whf =

0.00 0.000.00 −30.00

bhf =

−30.000.00

x(2) = [1.00,0.00]> h(1) = [0.76,0.00]>

f (2) =σ(Wif x(2)+bif +Whf h

(1)+bhf ) (18)

=σ([−30.00,0.00]>) (19)

= [0.00,0.50]> (20)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 14 / 49

Page 27: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forget Gate at t = 2: f (2)

Wif =

0.00 0.000.00 0.00

bif =

0.000.00

Whf =

0.00 0.000.00 −30.00

bhf =

−30.000.00

x(2) = [1.00,0.00]> h(1) = [0.76,0.00]>

f (2) =σ(Wif x(2)+bif +Whf h

(1)+bhf ) (18)

=σ([−30.00,0.00]>) (19)

= [0.00,0.50]> (20)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 14 / 49

Page 28: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Output Gate at t = 2: o(2)

Wio =

0.00 0.000.00 0.00

bio =

30.0030.00

Who =

0.00 0.000.00 0.00

bho =

0.000.00

x(2) = [1.00,0.00]> h(1) = [0.76,0.00]>

o(2) =σ(Wiox(2)+bio +Whoh(1)+bho) (21)

=σ([30.00,30.00]>) (22)

= [1.00,1.00]> (23)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 15 / 49

Page 29: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Output Gate at t = 2: o(2)

Wio =

0.00 0.000.00 0.00

bio =

30.0030.00

Who =

0.00 0.000.00 0.00

bho =

0.000.00

x(2) = [1.00,0.00]> h(1) = [0.76,0.00]>

o(2) =σ(Wiox(2)+bio +Whoh(1)+bho) (21)

=σ([30.00,30.00]>) (22)

= [1.00,1.00]> (23)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 15 / 49

Page 30: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Memory Contribution at t = 2: c̃(2)

Wi c̃ =

30.00 0.000.00 30.00

bi c̃ =

0.000.00

Whc̃ =

0.00 0.000.00 0.00

bhc̃ =

0.000.00

x(2) = [1.00,0.00]> h(1) = [0.76,0.00]>

c̃(2) = tanh(Wi c̃x(2)+bi c̃ +Whc̃h(1)+bhc̃) (24)

= tanh([30.00,0.00]>) (25)

= [1.00,0.00]> (26)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 16 / 49

Page 31: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Memory Contribution at t = 2: c̃(2)

Wi c̃ =

30.00 0.000.00 30.00

bi c̃ =

0.000.00

Whc̃ =

0.00 0.000.00 0.00

bhc̃ =

0.000.00

x(2) = [1.00,0.00]> h(1) = [0.76,0.00]>

c̃(2) = tanh(Wi c̃x(2)+bi c̃ +Whc̃h(1)+bhc̃) (24)

= tanh([30.00,0.00]>) (25)

= [1.00,0.00]> (26)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 16 / 49

Page 32: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 2

f2

[0.00,0.50]>

c1

[1.00,0.00]>i2

[1.00,1.00]>

c̃2

[1.00,0.00]>

� Message forward (c2)

c2 =f2 ◦ c1 + i2 ◦ c̃2 (27)

(28)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 17 / 49

Page 33: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 2

f2

[0.00,0.50]>

c1

[1.00,0.00]>i2

[1.00,1.00]>

c̃2

[1.00,0.00]>

� Message forward (c2)

c2 =f2 ◦ c1 + i2 ◦ c̃2 (27)

=[0.00,0.50]> ◦ [1.00,0.00]>+[1.00,1.00]> ◦ [1.00,0.00]> (28)

(29)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 17 / 49

Page 34: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 2

f2

[0.00,0.50]>

c1

[1.00,0.00]>i2

[1.00,1.00]>

c̃2

[1.00,0.00]>

� Message forward (c2)

c2 =f2 ◦ c1 + i2 ◦ c̃2 (27)

=[0.00,0.50]> ◦ [1.00,0.00]>+[1.00,1.00]> ◦ [1.00,0.00]> (28)

=[1.00,0.00]> (29)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 17 / 49

Page 35: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 2

f2

[0.00,0.50]>

c1

[1.00,0.00]>i2

[1.00,1.00]>

c̃2

[1.00,0.00]>

� Message forward (c2)

c2 =[1.00,0.00]> (27)

� New hidden (h2)

h2 (28)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 17 / 49

Page 36: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 2

f2

[0.00,0.50]>

c1

[1.00,0.00]>i2

[1.00,1.00]>

c̃2

[1.00,0.00]>

� Message forward (c2)

c2 =[1.00,0.00]> (27)

� New hidden (h2)

h2 =o2 ◦ tanh(c2) (28)

(29)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 17 / 49

Page 37: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 2

f2

[0.00,0.50]>

c1

[1.00,0.00]>i2

[1.00,1.00]>

c̃2

[1.00,0.00]>

� Message forward (c2)

c2 =[1.00,0.00]> (27)

� New hidden (h2)

h2 =o2 ◦ tanh(c2) (28)

=[1.00,1.00]> ◦ tanh([1.00,0.00]>) (29)

(30)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 17 / 49

Page 38: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 2

f2

[0.00,0.50]>

c1

[1.00,0.00]>i2

[1.00,1.00]>

c̃2

[1.00,0.00]>

� Message forward (c2)

c2 =[1.00,0.00]> (27)

� New hidden (h2)

h2 =o2 ◦ tanh(c2) (28)

=[1.00,1.00]> ◦ tanh([1.00,0.00]>) (29)

=[0.76,0.00]> (30)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 17 / 49

Page 39: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 2

f2

[0.00,0.50]>

c1

[1.00,0.00]>i2

[1.00,1.00]>

c̃2

[1.00,0.00]>

� Message forward (c2)

c2 =[1.00,0.00]> (27)

� New hidden (h2)

h2 =[0.76,0.00]> (28)

� Prediction y2 = softmax(h2) = 0

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 17 / 49

Page 40: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Summary at t = 2

σ σ Tanh σ

× +

× ×

Tanh

c⟨t−1⟩

1.00,0.00

h⟨t−1⟩

0.76,0.00

x ⟨t⟩1.00,0.00

c⟨t⟩

1.00,0.00

h⟨t⟩

0.76,0.00

h⟨t⟩0.76,0.00

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 18 / 49

Page 41: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Input Gate at t = 3: i1

Wii =

30.00 0.000.00 0.00

bii =

0.000.00

Whi =

0.00 0.0060.00 0.00

bhi =

0.00−30.00

x(3) = [0.00,1.00]> h(2) = [0.76,0.00]>

i(3) =σ(Wiix(3)+bii +Whih

(2)+bhi) (29)

=σ([0.00,15.70]>) (30)

= [0.50,1.00]> (31)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 19 / 49

Page 42: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Input Gate at t = 3: i1

Wii =

30.00 0.000.00 0.00

bii =

0.000.00

Whi =

0.00 0.0060.00 0.00

bhi =

0.00−30.00

x(3) = [0.00,1.00]> h(2) = [0.76,0.00]>

i(3) =σ(Wiix(3)+bii +Whih

(2)+bhi) (29)

=σ([0.00,15.70]>) (30)

= [0.50,1.00]> (31)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 19 / 49

Page 43: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forget Gate at t = 3: f (3)

Wif =

0.00 0.000.00 0.00

bif =

0.000.00

Whf =

0.00 0.000.00 −30.00

bhf =

−30.000.00

x(3) = [0.00,1.00]> h(2) = [0.76,0.00]>

f (3) =σ(Wif x(3)+bif +Whf h

(2)+bhf ) (32)

=σ([−30.00,0.00]>) (33)

= [0.00,0.50]> (34)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 20 / 49

Page 44: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forget Gate at t = 3: f (3)

Wif =

0.00 0.000.00 0.00

bif =

0.000.00

Whf =

0.00 0.000.00 −30.00

bhf =

−30.000.00

x(3) = [0.00,1.00]> h(2) = [0.76,0.00]>

f (3) =σ(Wif x(3)+bif +Whf h

(2)+bhf ) (32)

=σ([−30.00,0.00]>) (33)

= [0.00,0.50]> (34)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 20 / 49

Page 45: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Output Gate at t = 3: o(3)

Wio =

0.00 0.000.00 0.00

bio =

30.0030.00

Who =

0.00 0.000.00 0.00

bho =

0.000.00

x(3) = [0.00,1.00]> h(2) = [0.76,0.00]>

o(3) =σ(Wiox(3)+bio +Whoh(2)+bho) (35)

=σ([30.00,30.00]>) (36)

= [1.00,1.00]> (37)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 21 / 49

Page 46: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Output Gate at t = 3: o(3)

Wio =

0.00 0.000.00 0.00

bio =

30.0030.00

Who =

0.00 0.000.00 0.00

bho =

0.000.00

x(3) = [0.00,1.00]> h(2) = [0.76,0.00]>

o(3) =σ(Wiox(3)+bio +Whoh(2)+bho) (35)

=σ([30.00,30.00]>) (36)

= [1.00,1.00]> (37)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 21 / 49

Page 47: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Memory Contribution at t = 3: c̃(3)

Wi c̃ =

30.00 0.000.00 30.00

bi c̃ =

0.000.00

Whc̃ =

0.00 0.000.00 0.00

bhc̃ =

0.000.00

x(3) = [0.00,1.00]> h(2) = [0.76,0.00]>

c̃(3) = tanh(Wi c̃x(3)+bi c̃ +Whc̃h(2)+bhc̃) (38)

= tanh([0.00,30.00]>) (39)

= [0.00,1.00]> (40)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 22 / 49

Page 48: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Memory Contribution at t = 3: c̃(3)

Wi c̃ =

30.00 0.000.00 30.00

bi c̃ =

0.000.00

Whc̃ =

0.00 0.000.00 0.00

bhc̃ =

0.000.00

x(3) = [0.00,1.00]> h(2) = [0.76,0.00]>

c̃(3) = tanh(Wi c̃x(3)+bi c̃ +Whc̃h(2)+bhc̃) (38)

= tanh([0.00,30.00]>) (39)

= [0.00,1.00]> (40)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 22 / 49

Page 49: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 3

f3

[0.00,0.50]>

c2

[1.00,0.00]>i3

[0.50,1.00]>

c̃3

[0.00,1.00]>

� Message forward (c3)

c3 =f3 ◦ c2 + i3 ◦ c̃3 (41)

(42)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 23 / 49

Page 50: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 3

f3

[0.00,0.50]>

c2

[1.00,0.00]>i3

[0.50,1.00]>

c̃3

[0.00,1.00]>

� Message forward (c3)

c3 =f3 ◦ c2 + i3 ◦ c̃3 (41)

=[0.00,0.50]> ◦ [1.00,0.00]>+[0.50,1.00]> ◦ [0.00,1.00]> (42)

(43)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 23 / 49

Page 51: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 3

f3

[0.00,0.50]>

c2

[1.00,0.00]>i3

[0.50,1.00]>

c̃3

[0.00,1.00]>

� Message forward (c3)

c3 =f3 ◦ c2 + i3 ◦ c̃3 (41)

=[0.00,0.50]> ◦ [1.00,0.00]>+[0.50,1.00]> ◦ [0.00,1.00]> (42)

=[0.00,1.00]> (43)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 23 / 49

Page 52: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 3

f3

[0.00,0.50]>

c2

[1.00,0.00]>i3

[0.50,1.00]>

c̃3

[0.00,1.00]>

� Message forward (c3)

c3 =[0.00,1.00]> (41)

� New hidden (h3)

h3 (42)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 23 / 49

Page 53: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 3

f3

[0.00,0.50]>

c2

[1.00,0.00]>i3

[0.50,1.00]>

c̃3

[0.00,1.00]>

� Message forward (c3)

c3 =[0.00,1.00]> (41)

� New hidden (h3)

h3 =o3 ◦ tanh(c3) (42)

(43)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 23 / 49

Page 54: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 3

f3

[0.00,0.50]>

c2

[1.00,0.00]>i3

[0.50,1.00]>

c̃3

[0.00,1.00]>

� Message forward (c3)

c3 =[0.00,1.00]> (41)

� New hidden (h3)

h3 =o3 ◦ tanh(c3) (42)

=[1.00,1.00]> ◦ tanh([0.00,1.00]>) (43)

(44)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 23 / 49

Page 55: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 3

f3

[0.00,0.50]>

c2

[1.00,0.00]>i3

[0.50,1.00]>

c̃3

[0.00,1.00]>

� Message forward (c3)

c3 =[0.00,1.00]> (41)

� New hidden (h3)

h3 =o3 ◦ tanh(c3) (42)

=[1.00,1.00]> ◦ tanh([0.00,1.00]>) (43)

=[0.00,0.76]> (44)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 23 / 49

Page 56: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 3

f3

[0.00,0.50]>

c2

[1.00,0.00]>i3

[0.50,1.00]>

c̃3

[0.00,1.00]>

� Message forward (c3)

c3 =[0.00,1.00]> (41)

� New hidden (h3)

h3 =[0.00,0.76]> (42)

� Prediction y3 = softmax(h3) = 1

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 23 / 49

Page 57: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Summary at t = 3

σ σ Tanh σ

× +

× ×

Tanh

c⟨t−1⟩

1.00,0.00

h⟨t−1⟩

0.76,0.00

x ⟨t⟩0.00,1.00

c⟨t⟩

0.00,1.00

h⟨t⟩

0.00,0.76

h⟨t⟩0.00,0.76

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 24 / 49

Page 58: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Input Gate at t = 4: i1

Wii =

30.00 0.000.00 0.00

bii =

0.000.00

Whi =

0.00 0.0060.00 0.00

bhi =

0.00−30.00

x(4) = [0.00,1.00]> h(3) = [0.00,0.76]>

i(4) =σ(Wiix(4)+bii +Whih

(3)+bhi) (43)

=σ([0.00,−30.00]>) (44)

= [0.50,0.00]> (45)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 25 / 49

Page 59: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Input Gate at t = 4: i1

Wii =

30.00 0.000.00 0.00

bii =

0.000.00

Whi =

0.00 0.0060.00 0.00

bhi =

0.00−30.00

x(4) = [0.00,1.00]> h(3) = [0.00,0.76]>

i(4) =σ(Wiix(4)+bii +Whih

(3)+bhi) (43)

=σ([0.00,−30.00]>) (44)

= [0.50,0.00]> (45)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 25 / 49

Page 60: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forget Gate at t = 4: f (4)

Wif =

0.00 0.000.00 0.00

bif =

0.000.00

Whf =

0.00 0.000.00 −30.00

bhf =

−30.000.00

x(4) = [0.00,1.00]> h(3) = [0.00,0.76]>

f (4) =σ(Wif x(4)+bif +Whf h

(3)+bhf ) (46)

=σ([−30.00,−22.85]>) (47)

= [0.00,0.00]> (48)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 26 / 49

Page 61: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forget Gate at t = 4: f (4)

Wif =

0.00 0.000.00 0.00

bif =

0.000.00

Whf =

0.00 0.000.00 −30.00

bhf =

−30.000.00

x(4) = [0.00,1.00]> h(3) = [0.00,0.76]>

f (4) =σ(Wif x(4)+bif +Whf h

(3)+bhf ) (46)

=σ([−30.00,−22.85]>) (47)

= [0.00,0.00]> (48)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 26 / 49

Page 62: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Output Gate at t = 4: o(4)

Wio =

0.00 0.000.00 0.00

bio =

30.0030.00

Who =

0.00 0.000.00 0.00

bho =

0.000.00

x(4) = [0.00,1.00]> h(3) = [0.00,0.76]>

o(4) =σ(Wiox(4)+bio +Whoh(3)+bho) (49)

=σ([30.00,30.00]>) (50)

= [1.00,1.00]> (51)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 27 / 49

Page 63: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Output Gate at t = 4: o(4)

Wio =

0.00 0.000.00 0.00

bio =

30.0030.00

Who =

0.00 0.000.00 0.00

bho =

0.000.00

x(4) = [0.00,1.00]> h(3) = [0.00,0.76]>

o(4) =σ(Wiox(4)+bio +Whoh(3)+bho) (49)

=σ([30.00,30.00]>) (50)

= [1.00,1.00]> (51)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 27 / 49

Page 64: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Memory Contribution at t = 4: c̃(4)

Wi c̃ =

30.00 0.000.00 30.00

bi c̃ =

0.000.00

Whc̃ =

0.00 0.000.00 0.00

bhc̃ =

0.000.00

x(4) = [0.00,1.00]> h(3) = [0.00,0.76]>

c̃(4) = tanh(Wi c̃x(4)+bi c̃ +Whc̃h(3)+bhc̃) (52)

= tanh([0.00,30.00]>) (53)

= [0.00,1.00]> (54)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 28 / 49

Page 65: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Memory Contribution at t = 4: c̃(4)

Wi c̃ =

30.00 0.000.00 30.00

bi c̃ =

0.000.00

Whc̃ =

0.00 0.000.00 0.00

bhc̃ =

0.000.00

x(4) = [0.00,1.00]> h(3) = [0.00,0.76]>

c̃(4) = tanh(Wi c̃x(4)+bi c̃ +Whc̃h(3)+bhc̃) (52)

= tanh([0.00,30.00]>) (53)

= [0.00,1.00]> (54)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 28 / 49

Page 66: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 4

f4

[0.00,0.00]>

c3

[0.00,1.00]>i4

[0.50,0.00]>

c̃4

[0.00,1.00]>

� Message forward (c4)

c4 =f4 ◦ c3 + i4 ◦ c̃4 (55)

(56)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 29 / 49

Page 67: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 4

f4

[0.00,0.00]>

c3

[0.00,1.00]>i4

[0.50,0.00]>

c̃4

[0.00,1.00]>

� Message forward (c4)

c4 =f4 ◦ c3 + i4 ◦ c̃4 (55)

=[0.00,0.00]> ◦ [0.00,1.00]>+[0.50,0.00]> ◦ [0.00,1.00]> (56)

(57)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 29 / 49

Page 68: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 4

f4

[0.00,0.00]>

c3

[0.00,1.00]>i4

[0.50,0.00]>

c̃4

[0.00,1.00]>

� Message forward (c4)

c4 =f4 ◦ c3 + i4 ◦ c̃4 (55)

=[0.00,0.00]> ◦ [0.00,1.00]>+[0.50,0.00]> ◦ [0.00,1.00]> (56)

=[0.00,0.00]> (57)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 29 / 49

Page 69: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 4

f4

[0.00,0.00]>

c3

[0.00,1.00]>i4

[0.50,0.00]>

c̃4

[0.00,1.00]>

� Message forward (c4)

c4 =[0.00,0.00]> (55)

� New hidden (h4)

h4 (56)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 29 / 49

Page 70: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 4

f4

[0.00,0.00]>

c3

[0.00,1.00]>i4

[0.50,0.00]>

c̃4

[0.00,1.00]>

� Message forward (c4)

c4 =[0.00,0.00]> (55)

� New hidden (h4)

h4 =o4 ◦ tanh(c4) (56)

(57)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 29 / 49

Page 71: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 4

f4

[0.00,0.00]>

c3

[0.00,1.00]>i4

[0.50,0.00]>

c̃4

[0.00,1.00]>

� Message forward (c4)

c4 =[0.00,0.00]> (55)

� New hidden (h4)

h4 =o4 ◦ tanh(c4) (56)

=[1.00,1.00]> ◦ tanh([0.00,0.00]>) (57)

(58)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 29 / 49

Page 72: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 4

f4

[0.00,0.00]>

c3

[0.00,1.00]>i4

[0.50,0.00]>

c̃4

[0.00,1.00]>

� Message forward (c4)

c4 =[0.00,0.00]> (55)

� New hidden (h4)

h4 =o4 ◦ tanh(c4) (56)

=[1.00,1.00]> ◦ tanh([0.00,0.00]>) (57)

=[0.00,0.00]> (58)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 29 / 49

Page 73: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 4

f4

[0.00,0.00]>

c3

[0.00,1.00]>i4

[0.50,0.00]>

c̃4

[0.00,1.00]>

� Message forward (c4)

c4 =[0.00,0.00]> (55)

� New hidden (h4)

h4 =[0.00,0.00]> (56)

� Prediction y4 = softmax(h4) = 1

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 29 / 49

Page 74: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Summary at t = 4

σ σ Tanh σ

× +

× ×

Tanh

c⟨t−1⟩

0.00,1.00

h⟨t−1⟩

0.00,0.76

x ⟨t⟩0.00,1.00

c⟨t⟩

0.00,0.00

h⟨t⟩

0.00,0.00

h⟨t⟩0.00,0.00

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 30 / 49

Page 75: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Input Gate at t = 5: i1

Wii =

30.00 0.000.00 0.00

bii =

0.000.00

Whi =

0.00 0.0060.00 0.00

bhi =

0.00−30.00

x(5) = [1.00,0.00]> h(4) = [0.00,0.00]>

i(5) =σ(Wiix(5)+bii +Whih

(4)+bhi) (57)

=σ([30.00,−30.00]>) (58)

= [1.00,0.00]> (59)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 31 / 49

Page 76: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Input Gate at t = 5: i1

Wii =

30.00 0.000.00 0.00

bii =

0.000.00

Whi =

0.00 0.0060.00 0.00

bhi =

0.00−30.00

x(5) = [1.00,0.00]> h(4) = [0.00,0.00]>

i(5) =σ(Wiix(5)+bii +Whih

(4)+bhi) (57)

=σ([30.00,−30.00]>) (58)

= [1.00,0.00]> (59)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 31 / 49

Page 77: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forget Gate at t = 5: f (5)

Wif =

0.00 0.000.00 0.00

bif =

0.000.00

Whf =

0.00 0.000.00 −30.00

bhf =

−30.000.00

x(5) = [1.00,0.00]> h(4) = [0.00,0.00]>

f (5) =σ(Wif x(5)+bif +Whf h

(4)+bhf ) (60)

=σ([−30.00,−0.00]>) (61)

= [0.00,0.50]> (62)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 32 / 49

Page 78: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forget Gate at t = 5: f (5)

Wif =

0.00 0.000.00 0.00

bif =

0.000.00

Whf =

0.00 0.000.00 −30.00

bhf =

−30.000.00

x(5) = [1.00,0.00]> h(4) = [0.00,0.00]>

f (5) =σ(Wif x(5)+bif +Whf h

(4)+bhf ) (60)

=σ([−30.00,−0.00]>) (61)

= [0.00,0.50]> (62)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 32 / 49

Page 79: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Output Gate at t = 5: o(5)

Wio =

0.00 0.000.00 0.00

bio =

30.0030.00

Who =

0.00 0.000.00 0.00

bho =

0.000.00

x(5) = [1.00,0.00]> h(4) = [0.00,0.00]>

o(5) =σ(Wiox(5)+bio +Whoh(4)+bho) (63)

=σ([30.00,30.00]>) (64)

= [1.00,1.00]> (65)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 33 / 49

Page 80: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Output Gate at t = 5: o(5)

Wio =

0.00 0.000.00 0.00

bio =

30.0030.00

Who =

0.00 0.000.00 0.00

bho =

0.000.00

x(5) = [1.00,0.00]> h(4) = [0.00,0.00]>

o(5) =σ(Wiox(5)+bio +Whoh(4)+bho) (63)

=σ([30.00,30.00]>) (64)

= [1.00,1.00]> (65)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 33 / 49

Page 81: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Memory Contribution at t = 5: c̃(5)

Wi c̃ =

30.00 0.000.00 30.00

bi c̃ =

0.000.00

Whc̃ =

0.00 0.000.00 0.00

bhc̃ =

0.000.00

x(5) = [1.00,0.00]> h(4) = [0.00,0.00]>

c̃(5) = tanh(Wi c̃x(5)+bi c̃ +Whc̃h(4)+bhc̃) (66)

= tanh([30.00,0.00]>) (67)

= [1.00,0.00]> (68)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 34 / 49

Page 82: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Memory Contribution at t = 5: c̃(5)

Wi c̃ =

30.00 0.000.00 30.00

bi c̃ =

0.000.00

Whc̃ =

0.00 0.000.00 0.00

bhc̃ =

0.000.00

x(5) = [1.00,0.00]> h(4) = [0.00,0.00]>

c̃(5) = tanh(Wi c̃x(5)+bi c̃ +Whc̃h(4)+bhc̃) (66)

= tanh([30.00,0.00]>) (67)

= [1.00,0.00]> (68)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 34 / 49

Page 83: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 5

f5

[0.00,0.50]>

c4

[0.00,0.00]>i5

[1.00,0.00]>

c̃5

[1.00,0.00]>

� Message forward (c5)

c5 =f5 ◦ c4 + i5 ◦ c̃5 (69)

(70)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 35 / 49

Page 84: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 5

f5

[0.00,0.50]>

c4

[0.00,0.00]>i5

[1.00,0.00]>

c̃5

[1.00,0.00]>

� Message forward (c5)

c5 =f5 ◦ c4 + i5 ◦ c̃5 (69)

=[0.00,0.50]> ◦ [0.00,0.00]>+[1.00,0.00]> ◦ [1.00,0.00]> (70)

(71)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 35 / 49

Page 85: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 5

f5

[0.00,0.50]>

c4

[0.00,0.00]>i5

[1.00,0.00]>

c̃5

[1.00,0.00]>

� Message forward (c5)

c5 =f5 ◦ c4 + i5 ◦ c̃5 (69)

=[0.00,0.50]> ◦ [0.00,0.00]>+[1.00,0.00]> ◦ [1.00,0.00]> (70)

=[1.00,0.00]> (71)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 35 / 49

Page 86: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 5

f5

[0.00,0.50]>

c4

[0.00,0.00]>i5

[1.00,0.00]>

c̃5

[1.00,0.00]>

� Message forward (c5)

c5 =[1.00,0.00]> (69)

� New hidden (h5)

h5 (70)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 35 / 49

Page 87: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 5

f5

[0.00,0.50]>

c4

[0.00,0.00]>i5

[1.00,0.00]>

c̃5

[1.00,0.00]>

� Message forward (c5)

c5 =[1.00,0.00]> (69)

� New hidden (h5)

h5 =o5 ◦ tanh(c5) (70)

(71)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 35 / 49

Page 88: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 5

f5

[0.00,0.50]>

c4

[0.00,0.00]>i5

[1.00,0.00]>

c̃5

[1.00,0.00]>

� Message forward (c5)

c5 =[1.00,0.00]> (69)

� New hidden (h5)

h5 =o5 ◦ tanh(c5) (70)

=[1.00,1.00]> ◦ tanh([1.00,0.00]>) (71)

(72)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 35 / 49

Page 89: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 5

f5

[0.00,0.50]>

c4

[0.00,0.00]>i5

[1.00,0.00]>

c̃5

[1.00,0.00]>

� Message forward (c5)

c5 =[1.00,0.00]> (69)

� New hidden (h5)

h5 =o5 ◦ tanh(c5) (70)

=[1.00,1.00]> ◦ tanh([1.00,0.00]>) (71)

=[0.76,0.00]> (72)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 35 / 49

Page 90: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 5

f5

[0.00,0.50]>

c4

[0.00,0.00]>i5

[1.00,0.00]>

c̃5

[1.00,0.00]>

� Message forward (c5)

c5 =[1.00,0.00]> (69)

� New hidden (h5)

h5 =[0.76,0.00]> (70)

� Prediction y5 = softmax(h5) = 0

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 35 / 49

Page 91: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Summary at t = 5

σ σ Tanh σ

× +

× ×

Tanh

c⟨t−1⟩

0.00,0.00

h⟨t−1⟩

0.00,0.00

x ⟨t⟩1.00,0.00

c⟨t⟩

1.00,0.00

h⟨t⟩

0.76,0.00

h⟨t⟩0.76,0.00

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 36 / 49

Page 92: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Input Gate at t = 6: i1

Wii =

30.00 0.000.00 0.00

bii =

0.000.00

Whi =

0.00 0.0060.00 0.00

bhi =

0.00−30.00

x(6) = [0.00,1.00]> h(5) = [0.76,0.00]>

i(6) =σ(Wiix(6)+bii +Whih

(5)+bhi) (71)

=σ([0.00,15.70]>) (72)

= [0.50,1.00]> (73)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 37 / 49

Page 93: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Input Gate at t = 6: i1

Wii =

30.00 0.000.00 0.00

bii =

0.000.00

Whi =

0.00 0.0060.00 0.00

bhi =

0.00−30.00

x(6) = [0.00,1.00]> h(5) = [0.76,0.00]>

i(6) =σ(Wiix(6)+bii +Whih

(5)+bhi) (71)

=σ([0.00,15.70]>) (72)

= [0.50,1.00]> (73)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 37 / 49

Page 94: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forget Gate at t = 6: f (6)

Wif =

0.00 0.000.00 0.00

bif =

0.000.00

Whf =

0.00 0.000.00 −30.00

bhf =

−30.000.00

x(6) = [0.00,1.00]> h(5) = [0.76,0.00]>

f (6) =σ(Wif x(6)+bif +Whf h

(5)+bhf ) (74)

=σ([−30.00,−0.00]>) (75)

= [0.00,0.50]> (76)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 38 / 49

Page 95: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forget Gate at t = 6: f (6)

Wif =

0.00 0.000.00 0.00

bif =

0.000.00

Whf =

0.00 0.000.00 −30.00

bhf =

−30.000.00

x(6) = [0.00,1.00]> h(5) = [0.76,0.00]>

f (6) =σ(Wif x(6)+bif +Whf h

(5)+bhf ) (74)

=σ([−30.00,−0.00]>) (75)

= [0.00,0.50]> (76)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 38 / 49

Page 96: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Output Gate at t = 6: o(6)

Wio =

0.00 0.000.00 0.00

bio =

30.0030.00

Who =

0.00 0.000.00 0.00

bho =

0.000.00

x(6) = [0.00,1.00]> h(5) = [0.76,0.00]>

o(6) =σ(Wiox(6)+bio +Whoh(5)+bho) (77)

=σ([30.00,30.00]>) (78)

= [1.00,1.00]> (79)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 39 / 49

Page 97: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Output Gate at t = 6: o(6)

Wio =

0.00 0.000.00 0.00

bio =

30.0030.00

Who =

0.00 0.000.00 0.00

bho =

0.000.00

x(6) = [0.00,1.00]> h(5) = [0.76,0.00]>

o(6) =σ(Wiox(6)+bio +Whoh(5)+bho) (77)

=σ([30.00,30.00]>) (78)

= [1.00,1.00]> (79)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 39 / 49

Page 98: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Memory Contribution at t = 6: c̃(6)

Wi c̃ =

30.00 0.000.00 30.00

bi c̃ =

0.000.00

Whc̃ =

0.00 0.000.00 0.00

bhc̃ =

0.000.00

x(6) = [0.00,1.00]> h(5) = [0.76,0.00]>

c̃(6) = tanh(Wi c̃x(6)+bi c̃ +Whc̃h(5)+bhc̃) (80)

= tanh([0.00,30.00]>) (81)

= [0.00,1.00]> (82)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 40 / 49

Page 99: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Memory Contribution at t = 6: c̃(6)

Wi c̃ =

30.00 0.000.00 30.00

bi c̃ =

0.000.00

Whc̃ =

0.00 0.000.00 0.00

bhc̃ =

0.000.00

x(6) = [0.00,1.00]> h(5) = [0.76,0.00]>

c̃(6) = tanh(Wi c̃x(6)+bi c̃ +Whc̃h(5)+bhc̃) (80)

= tanh([0.00,30.00]>) (81)

= [0.00,1.00]> (82)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 40 / 49

Page 100: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 6

f6

[0.00,0.50]>

c5

[1.00,0.00]>i6

[0.50,1.00]>

c̃6

[0.00,1.00]>

� Message forward (c6)

c6 =f6 ◦ c5 + i6 ◦ c̃6 (83)

(84)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 41 / 49

Page 101: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 6

f6

[0.00,0.50]>

c5

[1.00,0.00]>i6

[0.50,1.00]>

c̃6

[0.00,1.00]>

� Message forward (c6)

c6 =f6 ◦ c5 + i6 ◦ c̃6 (83)

=[0.00,0.50]> ◦ [1.00,0.00]>+[0.50,1.00]> ◦ [0.00,1.00]> (84)

(85)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 41 / 49

Page 102: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 6

f6

[0.00,0.50]>

c5

[1.00,0.00]>i6

[0.50,1.00]>

c̃6

[0.00,1.00]>

� Message forward (c6)

c6 =f6 ◦ c5 + i6 ◦ c̃6 (83)

=[0.00,0.50]> ◦ [1.00,0.00]>+[0.50,1.00]> ◦ [0.00,1.00]> (84)

=[0.00,1.00]> (85)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 41 / 49

Page 103: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 6

f6

[0.00,0.50]>

c5

[1.00,0.00]>i6

[0.50,1.00]>

c̃6

[0.00,1.00]>

� Message forward (c6)

c6 =[0.00,1.00]> (83)

� New hidden (h6)

h6 (84)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 41 / 49

Page 104: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 6

f6

[0.00,0.50]>

c5

[1.00,0.00]>i6

[0.50,1.00]>

c̃6

[0.00,1.00]>

� Message forward (c6)

c6 =[0.00,1.00]> (83)

� New hidden (h6)

h6 =o6 ◦ tanh(c6) (84)

(85)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 41 / 49

Page 105: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 6

f6

[0.00,0.50]>

c5

[1.00,0.00]>i6

[0.50,1.00]>

c̃6

[0.00,1.00]>

� Message forward (c6)

c6 =[0.00,1.00]> (83)

� New hidden (h6)

h6 =o6 ◦ tanh(c6) (84)

=[1.00,1.00]> ◦ tanh([0.00,1.00]>) (85)

(86)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 41 / 49

Page 106: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 6

f6

[0.00,0.50]>

c5

[1.00,0.00]>i6

[0.50,1.00]>

c̃6

[0.00,1.00]>

� Message forward (c6)

c6 =[0.00,1.00]> (83)

� New hidden (h6)

h6 =o6 ◦ tanh(c6) (84)

=[1.00,1.00]> ◦ tanh([0.00,1.00]>) (85)

=[0.00,0.76]> (86)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 41 / 49

Page 107: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 6

f6

[0.00,0.50]>

c5

[1.00,0.00]>i6

[0.50,1.00]>

c̃6

[0.00,1.00]>

� Message forward (c6)

c6 =[0.00,1.00]> (83)

� New hidden (h6)

h6 =[0.00,0.76]> (84)

� Prediction y6 = softmax(h6) = 1

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 41 / 49

Page 108: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Summary at t = 6

σ σ Tanh σ

× +

× ×

Tanh

c⟨t−1⟩

1.00,0.00

h⟨t−1⟩

0.76,0.00

x ⟨t⟩0.00,1.00

c⟨t⟩

0.00,1.00

h⟨t⟩

0.00,0.76

h⟨t⟩0.00,0.76

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 42 / 49

Page 109: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Input Gate at t = 7: i1

Wii =

30.00 0.000.00 0.00

bii =

0.000.00

Whi =

0.00 0.0060.00 0.00

bhi =

0.00−30.00

x(7) = [1.00,0.00]> h(6) = [0.00,0.76]>

i(7) =σ(Wiix(7)+bii +Whih

(6)+bhi) (85)

=σ([30.00,−30.00]>) (86)

= [1.00,0.00]> (87)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 43 / 49

Page 110: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Input Gate at t = 7: i1

Wii =

30.00 0.000.00 0.00

bii =

0.000.00

Whi =

0.00 0.0060.00 0.00

bhi =

0.00−30.00

x(7) = [1.00,0.00]> h(6) = [0.00,0.76]>

i(7) =σ(Wiix(7)+bii +Whih

(6)+bhi) (85)

=σ([30.00,−30.00]>) (86)

= [1.00,0.00]> (87)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 43 / 49

Page 111: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forget Gate at t = 7: f (7)

Wif =

0.00 0.000.00 0.00

bif =

0.000.00

Whf =

0.00 0.000.00 −30.00

bhf =

−30.000.00

x(7) = [1.00,0.00]> h(6) = [0.00,0.76]>

f (7) =σ(Wif x(7)+bif +Whf h

(6)+bhf ) (88)

=σ([−30.00,−22.85]>) (89)

= [0.00,0.00]> (90)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 44 / 49

Page 112: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forget Gate at t = 7: f (7)

Wif =

0.00 0.000.00 0.00

bif =

0.000.00

Whf =

0.00 0.000.00 −30.00

bhf =

−30.000.00

x(7) = [1.00,0.00]> h(6) = [0.00,0.76]>

f (7) =σ(Wif x(7)+bif +Whf h

(6)+bhf ) (88)

=σ([−30.00,−22.85]>) (89)

= [0.00,0.00]> (90)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 44 / 49

Page 113: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Output Gate at t = 7: o(7)

Wio =

0.00 0.000.00 0.00

bio =

30.0030.00

Who =

0.00 0.000.00 0.00

bho =

0.000.00

x(7) = [1.00,0.00]> h(6) = [0.00,0.76]>

o(7) =σ(Wiox(7)+bio +Whoh(6)+bho) (91)

=σ([30.00,30.00]>) (92)

= [1.00,1.00]> (93)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 45 / 49

Page 114: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Output Gate at t = 7: o(7)

Wio =

0.00 0.000.00 0.00

bio =

30.0030.00

Who =

0.00 0.000.00 0.00

bho =

0.000.00

x(7) = [1.00,0.00]> h(6) = [0.00,0.76]>

o(7) =σ(Wiox(7)+bio +Whoh(6)+bho) (91)

=σ([30.00,30.00]>) (92)

= [1.00,1.00]> (93)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 45 / 49

Page 115: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Memory Contribution at t = 7: c̃(7)

Wi c̃ =

30.00 0.000.00 30.00

bi c̃ =

0.000.00

Whc̃ =

0.00 0.000.00 0.00

bhc̃ =

0.000.00

x(7) = [1.00,0.00]> h(6) = [0.00,0.76]>

c̃(7) = tanh(Wi c̃x(7)+bi c̃ +Whc̃h(6)+bhc̃) (94)

= tanh([30.00,0.00]>) (95)

= [1.00,0.00]> (96)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 46 / 49

Page 116: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Memory Contribution at t = 7: c̃(7)

Wi c̃ =

30.00 0.000.00 30.00

bi c̃ =

0.000.00

Whc̃ =

0.00 0.000.00 0.00

bhc̃ =

0.000.00

x(7) = [1.00,0.00]> h(6) = [0.00,0.76]>

c̃(7) = tanh(Wi c̃x(7)+bi c̃ +Whc̃h(6)+bhc̃) (94)

= tanh([30.00,0.00]>) (95)

= [1.00,0.00]> (96)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 46 / 49

Page 117: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 7

f7

[0.00,0.00]>

c6

[0.00,1.00]>i7

[1.00,0.00]>

c̃7

[1.00,0.00]>

� Message forward (c7)

c7 =f7 ◦ c6 + i7 ◦ c̃7 (97)

(98)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 47 / 49

Page 118: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 7

f7

[0.00,0.00]>

c6

[0.00,1.00]>i7

[1.00,0.00]>

c̃7

[1.00,0.00]>

� Message forward (c7)

c7 =f7 ◦ c6 + i7 ◦ c̃7 (97)

=[0.00,0.00]> ◦ [0.00,1.00]>+[1.00,0.00]> ◦ [1.00,0.00]> (98)

(99)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 47 / 49

Page 119: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 7

f7

[0.00,0.00]>

c6

[0.00,1.00]>i7

[1.00,0.00]>

c̃7

[1.00,0.00]>

� Message forward (c7)

c7 =f7 ◦ c6 + i7 ◦ c̃7 (97)

=[0.00,0.00]> ◦ [0.00,1.00]>+[1.00,0.00]> ◦ [1.00,0.00]> (98)

=[1.00,0.00]> (99)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 47 / 49

Page 120: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 7

f7

[0.00,0.00]>

c6

[0.00,1.00]>i7

[1.00,0.00]>

c̃7

[1.00,0.00]>

� Message forward (c7)

c7 =[1.00,0.00]> (97)

� New hidden (h7)

h7 (98)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 47 / 49

Page 121: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 7

f7

[0.00,0.00]>

c6

[0.00,1.00]>i7

[1.00,0.00]>

c̃7

[1.00,0.00]>

� Message forward (c7)

c7 =[1.00,0.00]> (97)

� New hidden (h7)

h7 =o7 ◦ tanh(c7) (98)

(99)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 47 / 49

Page 122: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 7

f7

[0.00,0.00]>

c6

[0.00,1.00]>i7

[1.00,0.00]>

c̃7

[1.00,0.00]>

� Message forward (c7)

c7 =[1.00,0.00]> (97)

� New hidden (h7)

h7 =o7 ◦ tanh(c7) (98)

=[1.00,1.00]> ◦ tanh([1.00,0.00]>) (99)

(100)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 47 / 49

Page 123: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 7

f7

[0.00,0.00]>

c6

[0.00,1.00]>i7

[1.00,0.00]>

c̃7

[1.00,0.00]>

� Message forward (c7)

c7 =[1.00,0.00]> (97)

� New hidden (h7)

h7 =o7 ◦ tanh(c7) (98)

=[1.00,1.00]> ◦ tanh([1.00,0.00]>) (99)

=[0.76,0.00]> (100)

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 47 / 49

Page 124: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Forward message at time step 7

f7

[0.00,0.00]>

c6

[0.00,1.00]>i7

[1.00,0.00]>

c̃7

[1.00,0.00]>

� Message forward (c7)

c7 =[1.00,0.00]> (97)

� New hidden (h7)

h7 =[0.76,0.00]> (98)

� Prediction y7 = softmax(h7) = 0

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 47 / 49

Page 125: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

Summary at t = 7

σ σ Tanh σ

× +

× ×

Tanh

c⟨t−1⟩

0.00,1.00

h⟨t−1⟩

0.00,0.76

x ⟨t⟩1.00,0.00

c⟨t⟩

1.00,0.00

h⟨t⟩

0.76,0.00

h⟨t⟩0.76,0.00

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 48 / 49

Page 126: Long Short Term Memory Networks - UMIACSusers.umiacs.umd.edu/~jbg/teaching/CMSC_470/08c_ex.pdf · Long Short Term Memory Networks Fenfei Guo and Jordan Boyd-Graber University of Maryland

What’s going on?

� What’s the classification?

� What inputs are important?

� When can things be forgotten?

� How would other sequences be classified?

Fenfei Guo and Jordan Boyd-Graber | UMD Long Short Term Memory Networks | 49 / 49