Top Banner
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 2017 1 Lecture 4: Backpropagation and Neural Networks
100

Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

May 17, 2018

Download

Documents

phungcong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 20171

Lecture 4:Backpropagation and

Neural Networks

Page 2: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 2017

Administrative

Assignment 1 due Thursday April 20, 11:59pm on Canvas

2

Page 3: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 11, 2017

Administrative

Project: TA specialities and some project ideas are posted on Piazza

3

Page 4: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 2017

Administrative

Google Cloud: All registered students will receive an email this week with instructions on how to redeem $100 in credits

4

Page 5: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 20175

want

scores function

SVM loss

data loss + regularization

Where we are...

Page 7: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 20177

Numerical gradient: slow :(, approximate :(, easy to write :)Analytic gradient: fast :), exact :), error-prone :(

In practice: Derive analytic gradient, check your implementation with numerical gradient

Gradient descent

Page 8: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 20178

x

W

hinge loss

R

+ Ls (scores)

Computational graphs

*

Page 9: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 20179

input image

loss

weights

Convolutional network(AlexNet)

Figure copyright Alex Krizhevsky, Ilya Sutskever, and

Geoffrey Hinton, 2012. Reproduced with permission.

Page 10: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201710

Neural Turing Machine

Figure reproduced with permission from a Twitter post by Andrej Karpathy.

input image

loss

Page 11: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 2017

Neural Turing Machine

Figure reproduced with permission from a Twitter post by Andrej Karpathy.

Page 12: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201712

e.g. x = -2, y = 5, z = -4

Backpropagation: a simple example

Page 13: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201713

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation: a simple example

Page 14: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201714

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation: a simple example

Page 15: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201715

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation: a simple example

Page 16: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201716

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation: a simple example

Page 17: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201717

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation: a simple example

Page 18: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201718

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation: a simple example

Page 19: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201719

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation: a simple example

Page 20: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201720

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation: a simple example

Page 21: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201721

Chain rule:

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation: a simple example

Page 22: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201722

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation: a simple example

Page 23: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201723

Chain rule:

e.g. x = -2, y = 5, z = -4

Want:

Backpropagation: a simple example

Page 24: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201724

f

Page 25: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201725

f

“local gradient”

Page 26: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201726

f

“local gradient”

gradients

Page 27: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201727

f

“local gradient”

gradients

Page 28: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201728

f

“local gradient”

gradients

Page 29: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201729

f

“local gradient”

gradients

Page 30: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201730

Another example:

Page 31: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201731

Another example:

Page 32: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201732

Another example:

Page 33: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201733

Another example:

Page 34: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201734

Another example:

Page 35: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201735

Another example:

Page 36: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201736

Another example:

Page 37: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201737

Another example:

Page 38: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201738

Another example:

Page 39: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201739

Another example:

(-1) * (-0.20) = 0.20

Page 40: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201740

Another example:

Page 41: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201741

Another example:

[local gradient] x [upstream gradient][1] x [0.2] = 0.2[1] x [0.2] = 0.2 (both inputs!)

Page 42: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201742

Another example:

Page 43: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201743

Another example:

[local gradient] x [upstream gradient]x0: [2] x [0.2] = 0.4w0: [-1] x [0.2] = -0.2

Page 44: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201744

sigmoid function

sigmoid gate

Page 45: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 2017

sigmoid gate

45

sigmoid function

(0.73) * (1 - 0.73) = 0.2

Page 46: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201746

add gate: gradient distributor

Patterns in backward flow

Page 47: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201747

add gate: gradient distributor

Patterns in backward flow

Q: What is a max gate?

Page 48: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201748

add gate: gradient distributor

Patterns in backward flow

max gate: gradient router

Page 49: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201749

add gate: gradient distributor

Patterns in backward flow

max gate: gradient router

Q: What is a mul gate?

Page 50: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201750

add gate: gradient distributor

Patterns in backward flow

max gate: gradient router

mul gate: gradient switcher

Page 51: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201751

+

Gradients add at branches

Page 52: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201752

f

“local gradient”

This is now the Jacobian matrix (derivative of each element of z w.r.t. each element of x)

(x,y,z are now vectors)

gradients

Gradients for vectorized code

Page 53: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201753

f(x) = max(0,x)(elementwise)

4096-d input vector

4096-d output vector

Vectorized operations

Page 54: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201754

Jacobian matrix

f(x) = max(0,x)(elementwise)

4096-d input vector

4096-d output vector

Vectorized operations

Q: what is the size of the Jacobian matrix?

Page 55: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201755

Jacobian matrix

f(x) = max(0,x)(elementwise)

4096-d input vector

4096-d output vector

Vectorized operations

Q: what is the size of the Jacobian matrix?[4096 x 4096!]

Page 56: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 2017

i.e. Jacobian would technically be a[409,600 x 409,600] matrix :\

f(x) = max(0,x)(elementwise)

4096-d input vector

4096-d output vector

Vectorized operations

Q: what is the size of the Jacobian matrix?[4096 x 4096!]

in practice we process an entire minibatch (e.g. 100) of examples at one time:

Page 57: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 2017

Q: what is the size of the Jacobian matrix?[4096 x 4096!]

Q2: what does it look like?

f(x) = max(0,x)(elementwise)

4096-d input vector

4096-d output vector

Vectorized operations

Jacobian matrix

Page 58: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201758

A vectorized example:

Page 59: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201759

A vectorized example:

Page 60: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201760

A vectorized example:

Page 61: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201761

A vectorized example:

Page 62: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201762

A vectorized example:

Page 63: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201763

A vectorized example:

Page 64: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201764

A vectorized example:

Page 65: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201765

A vectorized example:

Page 66: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201766

A vectorized example:

Page 67: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201767

A vectorized example:

Page 68: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201768

A vectorized example:

Page 69: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201769

A vectorized example:

Page 70: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201770

A vectorized example:

Always check: The gradient with respect to a variable should have the same shape as the variable

Page 71: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201771

A vectorized example:

Page 72: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201772

A vectorized example:

Page 73: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201773

A vectorized example:

Page 74: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201774

A vectorized example:

Page 75: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201775

Modularized implementation: forward / backward API

Graph (or Net) object (rough psuedo code)

Page 76: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201776

(x,y,z are scalars)

x

y

z*

Modularized implementation: forward / backward API

Page 77: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201777

(x,y,z are scalars)

x

y

z*

Modularized implementation: forward / backward API

Page 78: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201778

Example: Caffe layers

Caffe is licensed under BSD 2-Clause

Page 79: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201779

* top_diff (chain rule)

Caffe is licensed under BSD 2-Clause

Caffe Sigmoid Layer

Page 80: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201780

Stage your forward/backward computation!E.g. for the SVM:

margins

In Assignment 1: Writing SVM / Softmax

Page 81: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201781

● neural nets will be very large: impractical to write down gradient formula by hand for all parameters

● backpropagation = recursive application of the chain rule along a computational graph to compute the gradients of all inputs/parameters/intermediates

● implementations maintain a graph structure, where the nodes implement the forward() / backward() API

● forward: compute result of an operation and save any intermediates needed for gradient computation in memory

● backward: apply the chain rule to compute the gradient of the loss function with respect to the inputs

Summary so far...

Page 82: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201782

Next: Neural Networks

Page 83: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201783

Neural networks: without the brain stuff

(Before) Linear score function:

Page 84: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201784

(Before) Linear score function:

(Now) 2-layer Neural Network

Neural networks: without the brain stuff

Page 85: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201785

(Before) Linear score function:

(Now) 2-layer Neural Network

Neural networks: without the brain stuff

x hW1 sW2

3072 100 10

Page 86: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201786

(Before) Linear score function:

(Now) 2-layer Neural Network

Neural networks: without the brain stuff

x hW1 sW2

3072 100 10

Page 87: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201787

Neural networks: without the brain stuff

(Before) Linear score function:

(Now) 2-layer Neural Network or 3-layer Neural Network

Page 88: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201788

Full implementation of training a 2-layer Neural Network needs ~20 lines:

Page 89: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201789

In Assignment 2: Writing a 2-layer net

Page 90: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201790

This image by Fotis Bobolas is licensed under CC-BY 2.0

Page 91: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201791

Impulses carried toward cell body

Impulses carried away from cell body

This image by Felipe Peruchois licensed under CC-BY 3.0

dendrite

cell body

axon

presynaptic terminal

Page 92: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201792

Impulses carried toward cell body

Impulses carried away from cell body

This image by Felipe Peruchois licensed under CC-BY 3.0

dendrite

cell body

axon

presynaptic terminal

Page 93: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201793

sigmoid activation function

Impulses carried toward cell body

Impulses carried away from cell body

This image by Felipe Peruchois licensed under CC-BY 3.0

dendrite

cell body

axon

presynaptic terminal

Page 94: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 20179494

Impulses carried toward cell body

Impulses carried away from cell body

This image by Felipe Peruchois licensed under CC-BY 3.0

dendrite

cell body

axon

presynaptic terminal

Page 95: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201795

Biological Neurons:● Many different types● Dendrites can perform complex non-linear computations● Synapses are not a single weight but a complex non-linear dynamical

system● Rate code may not be adequate

[Dendritic Computation. London and Hausser]

Be very careful with your brain analogies!

Page 96: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201796

Sigmoid

tanh

ReLU

Leaky ReLU

Maxout

ELU

Activation functions

Page 97: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201797

“Fully-connected” layers“2-layer Neural Net”, or“1-hidden-layer Neural Net”

“3-layer Neural Net”, or“2-hidden-layer Neural Net”

Neural networks: Architectures

Page 98: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201798

We can efficiently evaluate an entire layer of neurons.

Example feed-forward computation of a neural network

Page 99: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201799

Example feed-forward computation of a neural network

Page 100: Backpropagation and Lecture 4: Neural Networkscs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf · 1 Lecture 4: Backpropagation and ... Serena Yeung Lecture 4 - April 13, 2017

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 2017100

Summary

- We arrange neurons into fully-connected layers- The abstraction of a layer has the nice property that it

allows us to use efficient vectorized code (e.g. matrix multiplies)

- Neural networks are not really neural- Next time: Convolutional Neural Networks