Top Banner
Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) CMSC 678 UMBC
93

Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Dec 23, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Convolutional Neural Networks (CNNs) and

Recurrent Neural Networks (RNNs)

CMSC 678UMBC

Page 2: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Recap from last time…

Page 3: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Feed-Forward Neural Network: Multilayer Perceptron

π‘₯

β„Ž# = 𝐹(𝐰𝐒)π‘₯ + 𝑏,)

β„Ž 𝑦

𝑦/

𝑦0

F: (non-linear) activation function

Classification: softmaxRegression: identity

G: (non-linear) activation function

𝑦1 = G(𝛃𝐣)β„Ž + 𝑏/)

𝜷

𝐰𝟏 𝐰𝟐 π°πŸ‘ π°πŸ’

information/computation flow

no self-loops (recurrence/reuse of weights)

Page 4: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Flavors of Gradient Descent

Set t = 0Pick a starting value ΞΈtUntil converged:

set gt = 0for example(s) i in full data:

1. Compute loss l on xi2. Accumulate gradient

g t += l’(xi)done

Get scaling factor ρ tSet θ t+1 = θ t - ρ t *g tSet t += 1

Set t = 0Pick a starting value ΞΈtUntil converged:

get batch B βŠ‚ full dataset gt = 0for example(s) i in B:

1. Compute loss l on xi2. Accumulate gradient

g t += l’(xi)doneGet scaling factor ρ tSet ΞΈ t+1 = ΞΈ t - ρ t *g tSet t += 1

Set t = 0Pick a starting value ΞΈtUntil converged:

for example i in full data:1. Compute loss l on xi2. Get gradient

g t = l’(xi)3. Get scaling factor ρ t4. Set ΞΈ t+1 = ΞΈ t - ρ t *g t5. Set t += 1

done

β€œOnline” β€œMinibatch” β€œBatch”

Page 5: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Dropout: Regularization in Neural Networks

π‘₯ β„Ž 𝑦

𝑦/

𝑦0

𝜷

𝐰𝟏 𝐰𝟐 π°πŸ‘ π°πŸ’

randomly ignore β€œneurons” (hi) during

training

Page 6: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

tanh Activation

tanh? π‘₯ =2

1 + exp(βˆ’2 βˆ— 𝑠 βˆ— π‘₯)βˆ’ 1

= 2𝜎I π‘₯ βˆ’ 1

s=10

s=0.5

s=1

Page 7: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Rectifiers Activations

relu π‘₯ = max(0, π‘₯)

softplus π‘₯ = log(1 + exp π‘₯ )

leaky_relu π‘₯ = W0.01π‘₯, π‘₯ < 0π‘₯, π‘₯ β‰₯ 0

Page 8: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Outline

Convolutional Neural Networks

What is a convolution?

Multidimensional Convolutions

Typical Convnet Operations

Deep convnets

Recurrent Neural Networks

Types of recurrence

A basic recurrent cell

BPTT: Backpropagation through time

Page 9: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Dot Product

βˆ‘π‘₯)𝑦 =\

]

π‘₯]𝑦]

Page 10: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Convolution: Modified Dot Product Around a Point

βˆ‘π‘₯)𝑦 # = \

]^_

π‘₯]`#𝑦]

Convolution/cross-correlation

Page 11: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Convolution: Modified Dot Product Around a Point

βˆ‘

π‘₯)𝑦 # =\]

π‘₯]`#𝑦]

π‘₯ ⋆ 𝑦 𝑖 =

Convolution/cross-correlation

Page 12: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Convolution: Modified Dot Product Around a Point

βˆ‘

π‘₯)𝑦 # =\]

π‘₯]`#𝑦]

π‘₯ ⋆ 𝑦 𝑖 =

Convolution/cross-correlation

Page 13: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Convolution: Modified Dot Product Around a Point

βˆ‘

π‘₯)𝑦 # =\]

π‘₯]`#𝑦]

π‘₯ ⋆ 𝑦 𝑖 =

Convolution/cross-correlation

Page 14: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Convolution: Modified Dot Product Around a Point

βˆ‘π‘₯⋆𝑦 𝑖 =

π‘₯)𝑦 # =\]

π‘₯]`#𝑦]

Convolution/cross-correlation

Page 15: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Convolution: Modified Dot Product Around a Point

βˆ‘π‘₯⋆𝑦 =

π‘₯)𝑦 # =\]

π‘₯]`#𝑦]

Convolution/cross-correlation

feature map

kernel

input (β€œimage”)

1-D convolution

Page 16: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Outline

Convolutional Neural Networks

What is a convolution?

Multidimensional Convolutions

Typical Convnet Operations

Deep convnets

Recurrent Neural Networks

Types of recurrence

A basic recurrent cell

BPTT: Backpropagation through time

Page 17: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

2-D Convolution

kernel

input (β€œimage”)

width: shape of the kernel (often square)

Page 18: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

2-D Convolution

input (β€œimage”)

stride(s): how many spaces to move the kernel

width: shape of the kernel (often square)

Page 19: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

2-D Convolution

input (β€œimage”)

stride(s): how many spaces to move the kernel

stride=1

width: shape of the kernel (often square)

Page 20: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

2-D Convolution

input (β€œimage”)

stride(s): how many spaces to move the kernel

stride=1

width: shape of the kernel (often square)

Page 21: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

2-D Convolution

input (β€œimage”)

stride(s): how many spaces to move the kernel

stride=1

width: shape of the kernel (often square)

Page 22: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

2-D Convolution

input (β€œimage”)

stride(s): how many spaces to move the kernel

stride=2

width: shape of the kernel (often square)

Page 23: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

2-D Convolution

input (β€œimage”)

stride(s): how many spaces to move the kernel

stride=2

width: shape of the kernel (often square)

skip starting here

Page 24: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

2-D Convolution

input (β€œimage”)

stride(s): how many spaces to move the kernel

stride=2

width: shape of the kernel (often square)

skip starting here

Page 25: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

2-D Convolution

input (β€œimage”)

stride(s): how many spaces to move the kernel

stride=2

width: shape of the kernel (often square)

skip starting here

Page 26: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

2-D Convolution

input (β€œimage”)

width: shape of the kernel (often square)

stride(s): how many spaces to move the kernel

padding: how to handle input/kernel shape

mismatches

β€œsame”: input.shape == output.shape

β€œdifferent”: input.shape β‰  output.shape

pad with 0s (one option)

Page 27: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

2-D Convolution

input (β€œimage”)

width: shape of the kernel (often square)

stride(s): how many spaces to move the kernel

padding: how to handle input/kernel shape

mismatches

β€œsame”: input.shape == output.shape

β€œdifferent”: input.shape β‰  output.shape

pad with 0s (another option)

pad with 0s (another option)

Page 28: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

2-D Convolution

input (β€œimage”)

width: shape of the kernel (often square)

stride(s): how many spaces to move the kernel

padding: how to handle input/kernel shape

mismatches

β€œsame”: input.shape == output.shape

β€œdifferent”: input.shape β‰  output.shape

Page 29: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

From fully connected to convolutional networks

image Fully connected layer

Slide credit: Svetlana Lazebnik

Page 30: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

image

feature map

learned weights

From fully connected to convolutional networks

Convolutional layer

Slide credit: Svetlana Lazebnik

Page 31: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

image

feature map

learned weights

From fully connected to convolutional networks

Convolutional layer

Slide credit: Svetlana Lazebnik

Page 32: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Convolution as feature extraction

Input Feature Map

.

.

.

Filters/Kernels

Slide credit: Svetlana Lazebnik

Page 33: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

image

feature map

learned weights

From fully connected to convolutional networks

Convolutional layer

Slide credit: Svetlana Lazebnik

Page 34: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

imagenext layer

Convolutional layer

From fully connected to convolutional networks

non-linearity and/or pooling

Slide adapted: Svetlana Lazebnik

Page 35: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Outline

Convolutional Neural Networks

What is a convolution?

Multidimensional Convolutions

Typical Convnet Operations

Deep convnets

Recurrent Neural Networks

Types of recurrence

A basic recurrent cell

BPTT: Backpropagation through time

Solving vanishing gradients problem

Page 36: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Input Image

Convolution (Learned)

Non-linearity

Spatial pooling

Feature maps

Input Feature Map

.

.

.

Key operations in a CNN

Slide credit: Svetlana Lazebnik, R. Fergus, Y. LeCun

Page 37: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Input Image

Convolution (Learned)

Non-linearity

Spatial pooling

Feature maps

Key operations

Example: Rectified Linear Unit (ReLU)

Slide credit: Svetlana Lazebnik, R. Fergus, Y. LeCun

Page 38: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Input Image

Convolution (Learned)

Non-linearity

Spatial pooling

Feature maps

Max

Key operations

Slide credit: Svetlana Lazebnik, R. Fergus, Y. LeCun

Page 39: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Design principles

Reduce filter sizes (except possibly at the lowest layer), factorize filters aggressively

Use 1x1 convolutions to reduce and expand the number of feature maps judiciously

Use skip connections and/or create multiple paths through the network

Slide credit: Svetlana Lazebnik

Page 40: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

LeNet-5

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE 86(11): 2278–2324, 1998.

Slide credit: Svetlana Lazebnik

Page 41: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

ImageNet

Validation classification

Validation classification

Validation classification

~14 million labeled images, 20k classes

Images gathered from Internet

Human labels via Amazon MTurk

ImageNet Large-Scale Visual Recognition Challenge (ILSVRC): 1.2 million training images,

1000 classes

www.image-net.org/challenges/LSVRC/

Slide credit: Svetlana Lazebnik

Page 42: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

http://www.inference.vc/deep-learning-is-easy/Slide credit: Svetlana Lazebnik

Page 43: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Outline

Convolutional Neural Networks

What is a convolution?

Multidimensional Convolutions

Typical Convnet Operations

Deep convnets

Recurrent Neural Networks

Types of recurrence

A basic recurrent cell

BPTT: Backpropagation through time

Solving vanishing gradients problem

Page 44: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

AlexNet: ILSVRC 2012 winner

Similar framework to LeNet but:Max pooling, ReLU nonlinearityMore data and bigger model (7 hidden layers, 650K units, 60M params)GPU implementation (50x speedup over CPU): Two GPUs for a weekDropout regularization

A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012

Slide credit: Svetlana Lazebnik

Page 45: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

GoogLeNet

Slide credit: Svetlana Lazebnik Szegedy et al., 2015

Page 46: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

GoogLeNet

Slide credit: Svetlana Lazebnik Szegedy et al., 2015

Page 47: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

GoogLeNet: Auxiliary Classifier at Sub-levels

Idea: try to make each sub-layer good (in its own way) at the prediction task

Slide credit: Svetlana Lazebnik Szegedy et al., 2015

Page 48: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

GoogLeNet

β€’ An alternative view:

Slide credit: Svetlana Lazebnik Szegedy et al., 2015

Page 49: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

ResNet (Residual Network)

He et al. β€œDeep Residual Learning for Image Recognition” (2016)

Make it easy for network layers to

represent the identity mapping

Skipping 2+ layers is intentional & needed

Slide credit: Svetlana Lazebnik

Page 50: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Summary: ILSVRC 2012-2015Team Year Place Error (top-5) External data

SuperVision 2012 - 16.4% no

SuperVision 2012 1st 15.3% ImageNet 22k

Clarifai (7 layers) 2013 - 11.7% no

Clarifai 2013 1st 11.2% ImageNet 22k

VGG (16 layers) 2014 2nd 7.32% no

GoogLeNet (19 layers) 2014 1st 6.67% no

ResNet (152 layers) 2015 1st 3.57%

Human expert* 5.1%

http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/

Slide credit: Svetlana Lazebnik

Page 51: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Rapid Progress due to CNNsClassification: ImageNet Challenge top-5 error

Figure source: Kaiming HeSlide credit: Svetlana Lazebnik

Page 52: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Outline

Convolutional Neural Networks

What is a convolution?

Multidimensional Convolutions

Typical Convnet Operations

Deep convnets

Recurrent Neural Networks

Types of recurrence

A basic recurrent cell

BPTT: Backpropagation through time

Page 53: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Network Types

x

h

y

Feed forward

Linearizable feature inputBag-of-items classification/regression

Basic non-linear model

Page 54: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Network Types

x

h0

y0

Recursive: One input, Sequence output

Automated caption generationh1

y1

h2

y2

Page 55: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Network Types

x0

h0

Recursive: Sequence input, one output

Document classificationAction recognition in video (high-level)

h1 h2

y

x1 x2

Page 56: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Network Types

x0

h0

Recursive: Sequence input, Sequence output (time delay)

Machine translationSequential description

Summarization

h1 h2

x1 x2

o0

y0

o1

y1

o2

y2

o3

y3

Page 57: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Network Types

x0

h0

Recursive: Sequence input, Sequence output

Part of speech taggingAction recognition (fine grained)

h1 h2

x1 x2

y0 y1 y2

Page 58: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

RNN Outputs: Image Captions

Show and Tell: A Neural Image Caption Generator, CVPR 15 Slide credit: Arun Mallya

Page 59: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

RNN Output:Visual Storytelling

CNN CNN CNN CNN CNN

GRU GRU GRU GRU GRU

Encode

Decode

GRUs GRUs …

The family got together

for a cookout

They had a lot of delicious

food.

The family got together for a cookout. They had a lot of delicious food. The dog was happy to be there. They had a great time on the beach.

They even had a swim in the water.Huang et al. (2016)

Human Reference

The family has gathered around the dinner table to share a meal together. They all pitched in to help cook the seafood to perfection.

Afterwards they took the family dog to the beach to get some exercise. The waves were cool and refreshing! The dog had so much fun in the water. One family member decided to get a better view of the waves!

Page 60: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Recurrent Networks

xi-3 xi-2 xixi-1

hi-3 hi-2 hi-1 hi

yi-3 yi-2 yiyi-1

observe these inputs one at a time

Page 61: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Recurrent Networks

xi-3 xi-2 xixi-1

hi-3 hi-2 hi-1 hi

yi-3 yi-2 yiyi-1

observe these inputs one at a time

predict the corresponding label

Page 62: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

from these hidden states

Recurrent Networks

xi-3 xi-2 xixi-1

hi-3 hi-2 hi-1 hi

yi-3 yi-2 yiyi-1

observe these inputs one at a time

predict the corresponding label

Page 63: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

from these hidden states

xi-3 xi-2 xixi-1

hi-3 hi-2 hi-1 hi

yi-3 yi-2 yiyi-1

observe these inputs one at a time

predict the corresponding label β€œcell”

Recurrent Networks

Page 64: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Outline

Convolutional Neural Networks

What is a convolution?

Multidimensional Convolutions

Typical Convnet Operations

Deep convnets

Recurrent Neural Networks

Types of recurrence

A basic recurrent cell

BPTT: Backpropagation through time

Page 65: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

xixi-1

hi-1 hi

yiyi-1

A Simple Recurrent Neural Network Cell

W W

U U

S S

Page 66: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

encoding

xixi-1

hi-1 hi

yiyi-1

A Simple Recurrent Neural Network Cell

W W

U U

S S

β„Ž# = tanh(π‘Šβ„Ž#d/ + π‘ˆπ‘₯#)

Page 67: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

decoding

encoding

xixi-1

hi-1 hi

yiyi-1

A Simple Recurrent Neural Network Cell

W W

U U

S S

β„Ž# = tanh(π‘Šβ„Ž#d/ + π‘ˆπ‘₯#) 𝑦# = softmax(π‘†β„Ž#)

Page 68: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

decoding

encoding

xixi-1

hi-1 hi

yiyi-1

A Simple Recurrent Neural Network Cell

W W

U U

S S

β„Ž# = tanh(π‘Šβ„Ž#d/ + π‘ˆπ‘₯#) 𝑦# = softmax(π‘†β„Ž#)

Weights are shared over time unrolling/unfolding: copy the RNN cell across time (inputs)

Page 69: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Outline

Convolutional Neural Networks

What is a convolution?

Multidimensional Convolutions

Typical Convnet Operations

Deep convnets

Recurrent Neural Networks

Types of recurrence

A basic recurrent cell

BPTT: Backpropagation through time

Page 70: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

BackPropagation Through Time (BPTT)

β€œUnfold” the network to create a single, large, feed-forward network

1. Weights are copied (W Γ  W(t))2. Gradients computed (Γ°π‘Š(h)), and3. Summed (βˆ‘h Γ°π‘Š(h))

Page 71: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

BPTT

β„Ž# = tanh(π‘Šβ„Ž#d/ + π‘ˆπ‘₯#)

𝑦# = softmax(π‘†β„Ž#)

per-step loss: cross entropy

πœ•πΈ#πœ•π‘Š

=πœ•πΈ#πœ•π‘¦#

πœ•π‘¦#πœ•π‘Š

=πœ•πΈ#πœ•π‘¦#

πœ•π‘¦#πœ•β„Ž#

πœ•β„Ž#πœ•π‘Š

xi-3 xi-2 xixi-1

hi-3 hi-2 hi-1 hi

yi-3 yi-2 yiyi-1

𝑦#dkβˆ— log 𝑝(𝑦#dk) 𝑦#d0βˆ— log 𝑝(𝑦#d0)𝐸# =

𝑦#βˆ— log 𝑝(𝑦#)𝑦#d/βˆ— log 𝑝(𝑦#d/)

Page 72: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

BPTT

β„Ž# = tanh(π‘Šβ„Ž#d/ + π‘ˆπ‘₯#)

𝑦# = softmax(π‘†β„Ž#)

per-step loss: cross entropy

πœ•πΈ#πœ•π‘Š

=πœ•πΈ#πœ•π‘¦#

πœ•π‘¦#πœ•π‘Š

=πœ•πΈ#πœ•π‘¦#

πœ•π‘¦#πœ•β„Ž#

πœ•β„Ž#πœ•π‘Š

πœ•β„Ž#πœ•π‘Š

= tanhm π‘Šβ„Ž#d/ + π‘ˆπ‘₯#πœ•π‘Šβ„Ž#d/πœ•π‘Š

xi-3 xi-2 xixi-1

hi-3 hi-2 hi-1 hi

yi-3 yi-2 yiyi-1

𝑦#dkβˆ— log 𝑝(𝑦#dk) 𝑦#d0βˆ— log 𝑝(𝑦#d0)𝐸# =

𝑦#βˆ— log 𝑝(𝑦#)𝑦#d/βˆ— log 𝑝(𝑦#d/)

Page 73: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

BPTT

β„Ž# = tanh(π‘Šβ„Ž#d/ + π‘ˆπ‘₯#)

𝑦# = softmax(π‘†β„Ž#)

per-step loss: cross entropy

πœ•πΈ#πœ•π‘Š

=πœ•πΈ#πœ•π‘¦#

πœ•π‘¦#πœ•π‘Š

=πœ•πΈ#πœ•π‘¦#

πœ•π‘¦#πœ•β„Ž#

πœ•β„Ž#πœ•π‘Š

πœ•β„Ž#πœ•π‘Š

= tanhm π‘Šβ„Ž#d/ + π‘ˆπ‘₯#πœ•π‘Šβ„Ž#d/πœ•π‘Š

= tanhm π‘Šβ„Ž#d/ + π‘ˆπ‘₯# β„Ž#d/ + π‘Šπœ•β„Ž#d/πœ•π‘Š

xi-3 xi-2 xixi-1

hi-3 hi-2 hi-1 hi

yi-3 yi-2 yiyi-1

𝑦#dkβˆ— log 𝑝(𝑦#dk) 𝑦#d0βˆ— log 𝑝(𝑦#d0)𝐸# =

𝑦#βˆ— log 𝑝(𝑦#)𝑦#d/βˆ— log 𝑝(𝑦#d/)

Page 74: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

BPTT

xi-3 xi-2 xixi-1

hi-3 hi-2 hi-1 hi

yi-3 yi-2 yiyi-1

𝑦#dkβˆ— log 𝑝(𝑦#dk) 𝑦#d0βˆ— log 𝑝(𝑦#d0)𝐸# =

𝑦#βˆ— log 𝑝(𝑦#)𝑦#d/βˆ— log 𝑝(𝑦#d/)

β„Ž# = tanh(π‘Šβ„Ž#d/ + π‘ˆπ‘₯#)

𝑦# = softmax(π‘†β„Ž#)

per-step loss: cross entropy

πœ•β„Ž#πœ•π‘Š

= tanhm π‘Šβ„Ž#d/ + π‘ˆπ‘₯# β„Ž#d/ + π‘Šπœ•β„Ž#d/πœ•π‘Š

= 𝛿#β„Ž#d/ + 𝛿#π‘ŠΓ°β„Ž#d/ β„Ž#d0 + π‘Šπœ•β„Ž#d0πœ•π‘Š

πœ•πΈ#πœ•π‘Š

=πœ•πΈ#πœ•π‘¦#

πœ•π‘¦#πœ•β„Ž#

πœ•β„Ž#πœ•π‘Š

= Γ°β„Ž#πœ•β„Ž#πœ•π‘Š

= 𝛿o(#) 𝛿o

(#) =πœ•πΈ#πœ•π‘¦#

πœ•π‘¦#πœ•β„Ž#

πœ•β„Ž#πœ•β„Žo

πœ•β„Žoπœ•π‘Š

Page 75: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

BPTT

πœ•β„Ž#πœ•π‘Š

= tanhm π‘Šβ„Ž#d/ + π‘ˆπ‘₯# β„Ž#d/ + π‘Šπœ•β„Ž#d/πœ•π‘Š

= tanhm π‘Šβ„Ž#d/ + π‘ˆπ‘₯# β„Ž#d/ + tanhm π‘Šβ„Ž#d/ + π‘ˆπ‘₯# π‘Štanhm π‘Šβ„Ž#d0 + π‘ˆπ‘₯#d/ β„Ž#d0 + π‘Šπœ•β„Ž#d0πœ•π‘Š

=\1

πœ•πΈ#πœ•π‘¦#

πœ•π‘¦#πœ•β„Ž#

πœ•β„Ž#πœ•β„Žo

πœ•β„Žoπœ•π‘Š(o)

= \1

𝛿1(#) πœ•β„Žoπœ•π‘Š(o)

𝛿o(#) =

πœ•πΈ#πœ•π‘¦#

πœ•π‘¦#πœ•β„Ž#

πœ•β„Ž#πœ•β„Žo

per-loss, per-step backpropagation error

Page 76: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

BPTT

xi-3 xi-2 xixi-1

hi-3 hi-2 hi-1 hi

yi-3 yi-2 yiyi-1

𝑦#dkβˆ— log 𝑝(𝑦#dk) 𝑦#d0βˆ— log 𝑝(𝑦#d0)𝐸# =

𝑦#βˆ— log 𝑝(𝑦#)𝑦#d/βˆ— log 𝑝(𝑦#d/)

β„Ž# = tanh(π‘Šβ„Ž#d/ + π‘ˆπ‘₯#)

𝑦# = softmax(π‘†β„Ž#)

per-step loss: cross entropy

πœ•πΈ#πœ•π‘Š

=\1

πœ•πΈ#πœ•π‘¦#

πœ•π‘¦#πœ•β„Ž#

πœ•β„Ž#πœ•π‘Š(1)

hidden chain rulecompact form

Page 77: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Why Is Training RNNs Hard?

Vanishing gradients

Multiply the samematrices at each

timestep Γ¨ multiply many matrices in the

gradients

βˆ‚Ct

βˆ‚h1= βˆ‚Ct

βˆ‚yt

βŽ›βŽβŽœ

⎞⎠⎟

βˆ‚ytβˆ‚h1

βŽ›βŽβŽœ

⎞⎠⎟

= βˆ‚Ct

βˆ‚yt

βŽ›βŽβŽœ

⎞⎠⎟

βˆ‚ytβˆ‚ht

βŽ›βŽβŽœ

⎞⎠⎟

βˆ‚htβˆ‚htβˆ’1

βŽ›βŽβŽœ

⎞⎠⎟!

βˆ‚h2βˆ‚h1

βŽ›βŽβŽœ

⎞⎠⎟

Page 78: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

The Vanilla RNN Backward

h1

x1 h0

C1

y1

h2

x2 h1

C2

y2

h3

x3 h2

C3

y3

ht = tanhWxthtβˆ’1

βŽ›βŽβŽœ

⎞⎠⎟

yt = F(ht )Ct = Loss(yt ,GTt )

βˆ‚Ct

βˆ‚h1= βˆ‚Ct

βˆ‚yt

βŽ›βŽβŽœ

⎞⎠⎟

βˆ‚ytβˆ‚h1

βŽ›βŽβŽœ

⎞⎠⎟

= βˆ‚Ct

βˆ‚yt

βŽ›βŽβŽœ

⎞⎠⎟

βˆ‚ytβˆ‚ht

βŽ›βŽβŽœ

⎞⎠⎟

βˆ‚htβˆ‚htβˆ’1

βŽ›βŽβŽœ

⎞⎠⎟!

βˆ‚h2βˆ‚h1

βŽ›βŽβŽœ

⎞⎠⎟

Slide credit: Arun Mallya

Page 79: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Vanishing Gradient Solution: Motivation

ht = htβˆ’1 + F(xt )

βˆ‚Ct

βˆ‚h1= βˆ‚Ct

βˆ‚yt

βŽ›βŽβŽœ

⎞⎠⎟

βˆ‚ytβˆ‚h1

βŽ›βŽβŽœ

⎞⎠⎟

= βˆ‚Ct

βˆ‚yt

βŽ›βŽβŽœ

⎞⎠⎟

βˆ‚ytβˆ‚ht

βŽ›βŽβŽœ

⎞⎠⎟

βˆ‚htβˆ‚htβˆ’1

βŽ›βŽβŽœ

⎞⎠⎟!

βˆ‚h2βˆ‚h1

βŽ›βŽβŽœ

⎞⎠⎟

Identity

The gradient does not decay as the error is propagated all the way back aka β€œConstant

Error Flow”

β‡’ βˆ‚htβˆ‚htβˆ’1

βŽ›βŽβŽœ

⎞⎠⎟= 1

ht = tanhWxthtβˆ’1

βŽ›βŽβŽœ

⎞⎠⎟

yt = F(ht )Ct = Loss(yt ,GTt )

Slide credit: Arun Mallya

Page 80: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Vanishing Gradient Solution:Model Implementations

LSTM: Long Short-Term Memory (Hochreiter & Schmidhuber, 1997)

GRU: Gated Recurrent Unit (Cho et al., 2014)

Basic Ideas: learn to forgethttp://colah.github.io/posts/2015-08-Understanding-LSTMs/

forget line

representation line

Page 81: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Long Short-Term Memory (LSTM): Hochreiter et al., (1997)

Create a β€œConstant Error Carousel” (CEC) which ensures that

gradients don’t decay

A memory cell that acts like an accumulator

(contains the identity relationship) over time

it ot

ft

Input Gate Output Gate

Forget Gate

ht

xt ht-1

Cell

ct

xt ht-1

xt

ht-1

W

Wi Wo

Wf

𝑐h = 𝑓h βŠ— 𝑐hd/ + 𝑖h βŠ— tanh π‘Šπ‘₯hβ„Žhd/

𝑓h = 𝜎 π‘Šsπ‘₯hβ„Žhd/

+ 𝑏s

xt ht-1

Slide credit: Arun Mallya

Page 82: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

I want to use CNNs/RNNs/Deep Learning in my project. I don’t want to do this all by hand.

Page 83: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Defining A Simple RNN in Python (Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Page 84: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Defining A Simple RNN in Python (Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Page 85: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Defining A Simple RNN in Python (Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

encode

Page 86: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Defining A Simple RNN in Python (Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

decode

Page 87: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Training A Simple RNN in Python(Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Page 88: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Training A Simple RNN in Python(Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Negative log-likelihood

Page 89: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Training A Simple RNN in Python(Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Negative log-likelihood

get predictions

Page 90: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Training A Simple RNN in Python(Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Negative log-likelihood

get predictions

eval predictions

Page 91: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Training A Simple RNN in Python(Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Negative log-likelihood

get predictions

eval predictionscompute gradient

Page 92: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Training A Simple RNN in Python(Modified Very Slightly)

http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Negative log-likelihood

get predictions

eval predictionscompute gradient

perform SGD

Page 93: Convolutional Neural Networks (CNNs) and Recurrent Neural ...

Slide Credit

http://slazebni.cs.illinois.edu/spring17/lec01_cnn_architectures.pdf

http://slazebni.cs.illinois.edu/spring17/lec02_rnn.pdf