Top Banner
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018 1 1
155

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Sep 08, 2018

Download

Documents

buimien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201811

Page 2: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018

Regularization: Add noise, then marginalize out

2

Last timeOptimization: SGD+Momentum, Nesterov, RMSProp, Adam

Regularization: Dropout

Image

Conv-64Conv-64MaxPool

Conv-128Conv-128MaxPool

Conv-256Conv-256MaxPool

Conv-512Conv-512MaxPool

Conv-512Conv-512MaxPool

FC-4096FC-4096

FC-C

Freeze these

Reinitialize this and train

Train

Test

Transfer Learning

2

Page 3: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 20183

Today

- Deep learning hardware- CPU, GPU, TPU

- Deep learning software- PyTorch and TensorFlow- Static vs Dynamic computation graphs

3

Page 4: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 20184

Deep Learning Hardware

4

Page 5: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 20185

My computer

5

Page 6: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 20186

Spot the CPU!(central processing unit)

This image is licensed under CC-BY 2.0

6

Page 7: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 20187

Spot the GPUs!(graphics processing unit)

This image is in the public domain

7

Page 8: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 20188

NVIDIA AMDvs

8

Page 9: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 20189

NVIDIA AMDvs

9

Page 10: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201810

CPU vs GPUCores Clock

SpeedMemory Price Speed

CPU(Intel Core i7-7700k)

4(8 threads with hyperthreading)

4.2 GHz System RAM

$339 ~540 GFLOPs FP32

GPU(NVIDIAGTX 1080 Ti)

3584 1.6 GHz 11 GB GDDR5X

$699 ~11.4 TFLOPs FP32

CPU: Fewer cores, but each core is much faster and much more capable; great at sequential tasks

GPU: More cores, but each core is much slower and “dumber”; great for parallel tasks

10

Page 11: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201811

Page 12: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201812

CPU vs GPU in practice

Data from https://github.com/jcjohnson/cnn-benchmarks

(CPU performance not well-optimized, a little unfair)

66x 67x 71x 64x 76x

12

Page 13: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201813

CPU vs GPU in practice

Data from https://github.com/jcjohnson/cnn-benchmarks

cuDNN much faster than “unoptimized” CUDA

2.8x 3.0x 3.1x 3.4x 2.8x

13

Page 14: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201814

CPU vs GPUCores Clock

SpeedMemory Price Speed

CPU(Intel Core i7-7700k)

4(8 threads with hyperthreading)

4.2 GHz System RAM

$339 ~540 GFLOPs FP32

GPU(NVIDIAGTX 1080 Ti)

3584 1.6 GHz 11 GB GDDR5X

$699 ~11.4 TFLOPs FP32

TPUNVIDIA TITAN V

5120 CUDA,640 Tensor

1.5 GHz 12GB HBM2

$2999 ~14 TFLOPs FP32~112 TFLOP FP16

TPUGoogle Cloud TPU

? ? 64 GB HBM

$6.50 per hour

~180 TFLOP

CPU: Fewer cores, but each core is much faster and much more capable; great at sequential tasks

GPU: More cores, but each core is much slower and “dumber”; great for parallel tasks

14

TPU: Specialized hardware for deep learning

Page 15: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201815

CPU vs GPUCores Clock

SpeedMemory Price Speed

CPU(Intel Core i7-7700k)

4(8 threads with hyperthreading)

4.2 GHz System RAM

$339 ~540 GFLOPs FP32

GPU(NVIDIAGTX 1080 Ti)

3584 1.6 GHz 11 GB GDDR5X

$699 ~11.4 TFLOPs FP32

TPUNVIDIA TITAN V

5120 CUDA,640 Tensor

1.5 GHz 12GB HBM2

$2999 ~14 TFLOPs FP32~112 TFLOP FP16

TPUGoogle Cloud TPU

? ? 64 GB HBM

$6.50 per hour

~180 TFLOP

15

NOTE: TITAN V isn’t technically a “TPU” since that’s a Google term, but both have hardware specialized for deep learning

Page 16: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201816

Page 17: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201817

Example: Matrix Multiplication

A x BB x C

A x C

=

17

Page 18: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201818

Programming GPUs● CUDA (NVIDIA only)

○ Write C-like code that runs directly on the GPU○ Optimized APIs: cuBLAS, cuFFT, cuDNN, etc

● OpenCL○ Similar to CUDA, but runs on anything○ Usually slower on NVIDIA hardware

● HIP https://github.com/ROCm-Developer-Tools/HIP ○ New project that automatically converts CUDA code to

something that can run on AMD GPUs● Udacity: Intro to Parallel Programming

https://www.udacity.com/course/cs344○ For deep learning just use existing libraries

18

Page 19: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201819

CPU / GPU Communication

Model is here

Data is here

19

Page 20: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201820

CPU / GPU Communication

Model is here

Data is here

If you aren’t careful, training can bottleneck on reading data and transferring to GPU!

Solutions:- Read all data into RAM- Use SSD instead of HDD- Use multiple CPU threads

to prefetch data

20

Page 21: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201821

Deep Learning Software

21

Page 22: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201822

A zoo of frameworks!

Caffe (UC Berkeley)

Torch (NYU / Facebook)

Theano (U Montreal)

TensorFlow (Google)

Caffe2 (Facebook)

PyTorch (Facebook)

CNTK (Microsoft)

PaddlePaddle(Baidu)

MXNet (Amazon)Developed by U Washington, CMU, MIT, Hong Kong U, etc but main framework of choice at AWS

And others...

22

Chainer

Deeplearning4j

Page 23: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201823

A zoo of frameworks!

Caffe (UC Berkeley)

Torch (NYU / Facebook)

Theano (U Montreal)

TensorFlow (Google)

Caffe2 (Facebook)

PyTorch (Facebook)

CNTK (Microsoft)

PaddlePaddle(Baidu)

MXNet (Amazon)Developed by U Washington, CMU, MIT, Hong Kong U, etc but main framework of choice at AWS

And others...

23

Chainer

Deeplearning4j

We’ll focus on these

Page 24: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201824

A zoo of frameworks!

Caffe (UC Berkeley)

Torch (NYU / Facebook)

Theano (U Montreal)

TensorFlow (Google)

Caffe2 (Facebook)

PyTorch (Facebook)

CNTK (Microsoft)

PaddlePaddle(Baidu)

MXNet (Amazon)Developed by U Washington, CMU, MIT, Hong Kong U, etc but main framework of choice at AWS

And others...

24

Chainer

Deeplearning4j

I’ve mostly used these

Page 25: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201825

Recall: Computational Graphs

x

W

hinge loss

R

+ Ls (scores)

*

25

Page 26: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201826

input image

loss

weights

Figure copyright Alex Krizhevsky, Ilya Sutskever, and

Geoffrey Hinton, 2012. Reproduced with permission.

Recall: Computational Graphs

26

Page 27: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201827

Recall: Computational Graphs

Figure reproduced with permission from a Twitter post by Andrej Karpathy.

input image

loss

27

Page 28: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201828

The point of deep learning frameworks

(1) Quick to develop and test new ideas(2) Automatically compute gradients(3) Run it all efficiently on GPU (wrap cuDNN, cuBLAS, etc)

28

Page 29: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201829

Computational Graphsx y z

*

a+

b

Σ

c

Numpy

29

Page 30: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201830

Computational Graphsx y z

*

a+

b

Σ

c

Numpy

30

Page 31: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201831

Computational Graphsx y z

*

a+

b

Σ

c

Numpy

Bad: - Have to compute

our own gradients- Can’t run on GPU

31

Good: - Clean API, easy to

write numeric code

Page 32: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201832

Computational Graphsx y z

*

a+

b

Σ

c

Numpy

32

PyTorch

Looks exactly like numpy!

Page 33: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201833

Computational Graphsx y z

*

a+

b

Σ

c

Numpy

33

PyTorch

PyTorch handles gradients for us!

Page 34: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201834

Computational Graphsx y z

*

a+

b

Σ

c

Numpy

34

PyTorch

Trivial to run on GPU - just construct arrays on a different device!

Page 35: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201835

PyTorch(More detail)

35

Page 36: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201836

PyTorch: Fundamental Concepts

Tensor: Like a numpy array, but can run on GPU

Module: A neural network layer; may store state or learnable weights

36

Autograd: Package for building computational graphs out of Tensors, and automatically computing gradients

Page 37: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201837

PyTorch: Versions

For this class we are using PyTorch version 0.4 which was released Tuesday 4/24

This version makes a lot of changes to some of the core APIs around autograd, Tensor construction, Tensor datatypes / devices, etc

Be careful if you are looking at older PyTorch code!

37

Page 38: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201838

PyTorch: Tensors

38

Running example: Train a two-layer ReLU network on random data with L2 loss

Page 39: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201839

PyTorch: TensorsPyTorch Tensors are just like numpy arrays, but they can run on GPU.

PyTorch Tensor API looks almost exactly like numpy!

Here we fit a two-layer net using PyTorch Tensors:

39

Page 40: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201840

PyTorch: TensorsCreate random tensors for data and weights

40

Page 41: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201841

PyTorch: Tensors

Forward pass: compute predictions and loss

41

Page 42: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201842

PyTorch: Tensors

Backward pass: manually compute gradients

42

Page 43: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201843

PyTorch: Tensors

Gradient descent step on weights

43

Page 44: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201844

PyTorch: Tensors

To run on GPU, just use a different device!

44

Page 45: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201845

PyTorch: Autograd

Creating Tensors with requires_grad=True enables autograd

Operations on Tensors with requires_grad=True cause PyTorch to build a computational graph

45

Page 46: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201846

PyTorch: Autograd

We will not want gradients (of loss) with respect to data

Do want gradients with respect to weights

46

Page 47: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201847

PyTorch: Autograd

Forward pass looks exactly the same as before, but we don’t need to track intermediate values - PyTorch keeps track of them for us in the graph

47

Page 48: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201848

PyTorch: Autograd

Compute gradient of loss with respect to w1 and w2

48

Page 49: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201849

PyTorch: Autograd

Make gradient step on weights, then zero them. Torch.no_grad means “don’t build a computational graph for this part”

49

Page 50: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201850

PyTorch: Autograd

PyTorch methods that end in underscore modify the Tensor in-place; methods that don’t return a new Tensor

50

Page 51: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201851

PyTorch: New Autograd FunctionsDefine your own autograd functions by writing forward and backward functions for Tensors

Very similar to modular layers in A2! Use ctx object to “cache” values for the backward pass, just like cache objects from A2

51

Page 52: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201852

PyTorch: New Autograd FunctionsDefine your own autograd functions by writing forward and backward functions for Tensors

Very similar to modular layers in A2! Use ctx object to “cache” values for the backward pass, just like cache objects from A2

Define a helper function to make it easy to use the new function

52

Page 53: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201853

PyTorch: New Autograd Functions

Can use our new autograd function in the forward pass

53

Page 54: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201854

PyTorch: New Autograd Functions

In practice you almost never need to define new autograd functions! Only do it when you need custom backward. In this case we can just use a normal Python function

54

Page 55: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201855

PyTorch: nn

Higher-level wrapper for working with neural nets

Use this! It will make your life easier

55

Page 56: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201856

PyTorch: nn

Define our model as a sequence of layers; each layer is an object that holds learnable weights

56

Page 57: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201857

PyTorch: nn

Forward pass: feed data to model, and compute loss

57

Page 58: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201858

PyTorch: nn

58

torch.nn.functional has useful helpers like loss functions

Forward pass: feed data to model, and compute loss

Page 59: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201859

PyTorch: nn

Backward pass: compute gradient with respect to all model weights (they have requires_grad=True)

59

Page 60: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201860

PyTorch: nn

Make gradient step on each model parameter(with gradients disabled)

60

Page 61: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201861

PyTorch: optim

Use an optimizer for different update rules

61

Page 62: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201862

PyTorch: optim

After computing gradients, use optimizer to update params and zero gradients

62

Page 63: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201863

Aside: Lua TorchDirect ancestor of PyTorch (they used to share a lot of C backend)

Written in Lua, not Python

Torch has Tensors and Modules like PyTorch, but no full-featured autograd; much more painful to work with

More details: Check 2016 slides

63

Page 64: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201864

PyTorch: nnDefine new ModulesA PyTorch Module is a neural net layer; it inputs and outputs Tensors

Modules can contain weights or other modules

You can define your own Modules using autograd!

64

Page 65: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201865

PyTorch: nnDefine new Modules

Define our whole model as a single Module

65

Page 66: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201866

PyTorch: nnDefine new Modules

Initializer sets up two children (Modules can contain modules)

66

Page 67: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201867

PyTorch: nnDefine new Modules

Define forward pass using child modules

No need to define backward - autograd will handle it

67

Page 68: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201868

PyTorch: nnDefine new Modules

Construct and train an instance of our model

68

Page 69: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201869

PyTorch: nnDefine new ModulesVery common to mix and match custom Module subclasses and Sequential containers

69

Page 70: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201870

PyTorch: nnDefine new Modules

Define network component as a Module subclass

70

Page 71: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201871

PyTorch: nnDefine new Modules

Stack multiple instances of the component in a sequential

71

Page 72: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 20187272

x

h1,1 h1,2

h1

FC FC

✕relu

h2,1 h2,2

FC FC

✕relu

h1

y

Page 73: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201873

PyTorch: DataLoaders

A DataLoader wraps a Dataset and provides minibatching, shuffling, multithreading, for you

When you need to load custom data, just write your own Dataset class

73

Page 74: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201874

PyTorch: DataLoaders

Iterate over loader to form minibatches

74

Page 75: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201875

PyTorch: Pretrained Models

Super easy to use pretrained models with torchvision https://github.com/pytorch/vision

75

Page 76: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201876

PyTorch: Visdom

This image is licensed under CC-BY 4.0; no changes were made to the image

Visualization tool: add logging to your code, then visualize in a browser

Can’t visualize computational graph structure (yet?)

https://github.com/facebookresearch/visdom

76

Page 77: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201877

PyTorch: Dynamic Computation Graphs

Page 78: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201878

PyTorch: Dynamic Computation Graphsx w1 w2 y

Create Tensor objects

Page 79: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201879

PyTorch: Dynamic Computation Graphsx w1 w2 y

mm

clamp

mm

y_pred

Build graph data structure AND perform computation

Page 80: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201880

PyTorch: Dynamic Computation Graphsx w1 w2 y

mm

clamp

mm

y_pred

-

pow sum lossBuild graph data structure AND perform computation

Page 81: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201881

PyTorch: Dynamic Computation Graphsx w1 w2 y

mm

clamp

mm

y_pred

-

pow sum lossSearch for path between loss and w1, w2 (for backprop) AND perform computation

Page 82: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201882

PyTorch: Dynamic Computation Graphsx w1 w2 y

Throw away the graph, backprop path, and rebuild it from scratch on every iteration

Page 83: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201883

PyTorch: Dynamic Computation Graphsx w1 w2 y

mm

clamp

mm

y_pred

Build graph data structure AND perform computation

Page 84: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201884

PyTorch: Dynamic Computation Graphsx w1 w2 y

mm

clamp

mm

y_pred

-

pow sum lossBuild graph data structure AND perform computation

Page 85: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201885

PyTorch: Dynamic Computation Graphsx w1 w2 y

mm

clamp

mm

y_pred

-

pow sum lossSearch for path between loss and w1, w2 (for backprop) AND perform computation

Page 86: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201886

PyTorch: Dynamic Computation Graphs

Building the graph and computing the graph happen at the same time.

Seems inefficient, especially if we are building the same graph over and over again...

Page 87: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201887

Static Computation Graphs

Alternative: Static graphs

Step 1: Build computational graph describing our computation (including finding paths for backprop)

Step 2: Reuse the same graph on every iteration

Page 88: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201888

TensorFlow

88

Page 89: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201889

TensorFlow: Neural Net

(Assume imports at the top of each snipppet)

89

Page 90: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201890

TensorFlow: Neural Net

90

First define computational graph

Then run the graph many times

Page 91: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201891

TensorFlow: Neural Net

Create placeholders for input x, weights w1 and w2, and targets y

91

Page 92: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201892

TensorFlow: Neural Net

Forward pass: compute prediction for y and loss. No computation - just building graph

92

Page 93: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201893

TensorFlow: Neural Net

Tell TensorFlow to compute loss of gradient with respect to w1 and w2. No compute - just building the graph

93

Page 94: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201894

TensorFlow: Neural Net

94

Find paths between loss and w1, w2

Page 95: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201895

TensorFlow: Neural Net

95

Add new operators to the graph which compute grad_w1 and grad_w2

Page 96: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201896

TensorFlow: Neural Net

Now done building our graph, so we enter a session so we can actually run the graph

96

Page 97: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201897

TensorFlow: Neural Net

Create numpy arrays that will fill in the placeholders above

97

Page 98: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201898

TensorFlow: Neural Net

Run the graph: feed in the numpy arrays for x, y, w1, and w2; get numpy arrays for loss, grad_w1, and grad_w2

98

Page 99: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 201899

TensorFlow: Neural Net

Train the network: Run the graph over and over, use gradient to update weights

99

Page 100: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018100

TensorFlow: Neural Net

Train the network: Run the graph over and over, use gradient to update weights

Problem: copying weights between CPU / GPU each step

100

Page 101: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018101

TensorFlow: Neural Net

Change w1 and w2 from placeholder (fed on each call) to Variable (persists in the graph between calls)

101

Page 102: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018102

TensorFlow: Neural Net

Add assign operations to update w1 and w2 as part of the graph!

102

Page 103: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018103

TensorFlow: Neural Net

Run graph once to initialize w1 and w2

Run many times to train

103

Page 104: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018104

TensorFlow: Neural Net

Problem: loss not going down! Assign calls not actually being executed!

104

Page 105: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018105

TensorFlow: Neural Net

Add dummy graph node that depends on updates

Tell TensorFlow to compute dummy node

105

Page 106: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018106

TensorFlow: Optimizer

Can use an optimizer to compute gradients and update weights

Remember to execute the output of the optimizer!

106

Page 107: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018107

TensorFlow: Loss

Use predefined common lossees

107

Page 108: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018108

TensorFlow: Layers

Use He initializer

tf.layers automatically sets up weight and (and bias) for us!

108

Page 109: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018109

Keras: High-Level WrapperKeras is a layer on top of TensorFlow, makes common things easy to do

(Used to be third-party, now merged into TensorFlow)

109

Page 110: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018110

Keras: High-Level Wrapper

110

Define model as a sequence of layers

Get output by calling the model

Page 111: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018111

Keras: High-Level Wrapper

111

Keras can handle the training loop for you! No sessions or feed_dict

Page 112: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018112

Keras (https://keras.io/)

tf.keras (https://www.tensorflow.org/api_docs/python/tf/keras)

tf.layers (https://www.tensorflow.org/api_docs/python/tf/layers)

tf.estimator (https://www.tensorflow.org/api_docs/python/tf/estimator)

tf.contrib.estimator (https://www.tensorflow.org/api_docs/python/tf/contrib/estimator) tf.contrib.layers (https://www.tensorflow.org/api_docs/python/tf/contrib/layers)tf.contrib.slim (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim) tf.contrib.learn (https://www.tensorflow.org/api_docs/python/tf/contrib/learn) Sonnet (https://github.com/deepmind/sonnet)

TFLearn (http://tflearn.org/)

TensorLayer (http://tensorlayer.readthedocs.io/en/latest/)

TensorFlow: High-Level Wrappers

112

Page 113: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018113

Keras (https://keras.io/)

tf.keras (https://www.tensorflow.org/api_docs/python/tf/keras)

tf.layers (https://www.tensorflow.org/api_docs/python/tf/layers)

tf.estimator (https://www.tensorflow.org/api_docs/python/tf/estimator)

tf.contrib.estimator (https://www.tensorflow.org/api_docs/python/tf/contrib/estimator) tf.contrib.layers (https://www.tensorflow.org/api_docs/python/tf/contrib/layers)tf.contrib.slim (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim) tf.contrib.learn (https://www.tensorflow.org/api_docs/python/tf/contrib/learn) Sonnet (https://github.com/deepmind/sonnet)

TFLearn (http://tflearn.org/)

TensorLayer (http://tensorlayer.readthedocs.io/en/latest/)

TensorFlow: High-Level Wrappers

113

Page 114: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018114

Keras (https://keras.io/)

tf.keras (https://www.tensorflow.org/api_docs/python/tf/keras)

tf.layers (https://www.tensorflow.org/api_docs/python/tf/layers)

tf.estimator (https://www.tensorflow.org/api_docs/python/tf/estimator)

tf.contrib.estimator (https://www.tensorflow.org/api_docs/python/tf/contrib/estimator) tf.contrib.layers (https://www.tensorflow.org/api_docs/python/tf/contrib/layers)tf.contrib.slim (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim) tf.contrib.learn (https://www.tensorflow.org/api_docs/python/tf/contrib/learn) Sonnet (https://github.com/deepmind/sonnet)

TFLearn (http://tflearn.org/)

TensorLayer (http://tensorlayer.readthedocs.io/en/latest/)

TensorFlow: High-Level Wrappers

114

Page 115: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018115

Keras (https://keras.io/)

tf.keras (https://www.tensorflow.org/api_docs/python/tf/keras)

tf.layers (https://www.tensorflow.org/api_docs/python/tf/layers)

tf.estimator (https://www.tensorflow.org/api_docs/python/tf/estimator)

tf.contrib.estimator (https://www.tensorflow.org/api_docs/python/tf/contrib/estimator) tf.contrib.layers (https://www.tensorflow.org/api_docs/python/tf/contrib/layers)tf.contrib.slim (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim) tf.contrib.learn (https://www.tensorflow.org/api_docs/python/tf/contrib/learn) Sonnet (https://github.com/deepmind/sonnet)

TFLearn (http://tflearn.org/)

TensorLayer (http://tensorlayer.readthedocs.io/en/latest/)

TensorFlow: High-Level Wrappers

115

Ships with TensorFlow

Page 116: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018116

Keras (https://keras.io/)

tf.keras (https://www.tensorflow.org/api_docs/python/tf/keras)

tf.layers (https://www.tensorflow.org/api_docs/python/tf/layers)

tf.estimator (https://www.tensorflow.org/api_docs/python/tf/estimator)

tf.contrib.estimator (https://www.tensorflow.org/api_docs/python/tf/contrib/estimator) tf.contrib.layers (https://www.tensorflow.org/api_docs/python/tf/contrib/layers)tf.contrib.slim (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim) tf.contrib.learn (https://www.tensorflow.org/api_docs/python/tf/contrib/learn) DEPRECATEDSonnet (https://github.com/deepmind/sonnet)

TFLearn (http://tflearn.org/)

TensorLayer (http://tensorlayer.readthedocs.io/en/latest/)

TensorFlow: High-Level Wrappers

116

Ships with TensorFlow

Page 117: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018117

Keras (https://keras.io/)

tf.keras (https://www.tensorflow.org/api_docs/python/tf/keras)

tf.layers (https://www.tensorflow.org/api_docs/python/tf/layers)

tf.estimator (https://www.tensorflow.org/api_docs/python/tf/estimator)

tf.contrib.estimator (https://www.tensorflow.org/api_docs/python/tf/contrib/estimator) tf.contrib.layers (https://www.tensorflow.org/api_docs/python/tf/contrib/layers)tf.contrib.slim (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim) tf.contrib.learn (https://www.tensorflow.org/api_docs/python/tf/contrib/learn) DEPRECATEDSonnet (https://github.com/deepmind/sonnet)

TensorFlow: High-Level Wrappers

117

Ships with TensorFlow

By DeepMind

Page 118: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018118

Keras (https://keras.io/)

tf.keras (https://www.tensorflow.org/api_docs/python/tf/keras)

tf.layers (https://www.tensorflow.org/api_docs/python/tf/layers)

tf.estimator (https://www.tensorflow.org/api_docs/python/tf/estimator)

tf.contrib.estimator (https://www.tensorflow.org/api_docs/python/tf/contrib/estimator) tf.contrib.layers (https://www.tensorflow.org/api_docs/python/tf/contrib/layers)tf.contrib.slim (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim) tf.contrib.learn (https://www.tensorflow.org/api_docs/python/tf/contrib/learn) DEPRECATEDSonnet (https://github.com/deepmind/sonnet)

TFLearn (http://tflearn.org/)

TensorLayer (http://tensorlayer.readthedocs.io/en/latest/)

TensorFlow: High-Level Wrappers

118

Ships with TensorFlow

By DeepMind

Third-Party

Page 119: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018119

tf.keras: (https://www.tensorflow.org/api_docs/python/tf/keras/applications)

TF-Slim: (https://github.com/tensorflow/models/tree/master/slim/nets)

TensorFlow: Pretrained Models

119

Image

Conv-64Conv-64MaxPool

Conv-128Conv-128MaxPool

Conv-256Conv-256MaxPool

Conv-512Conv-512MaxPool

Conv-512Conv-512MaxPool

FC-4096FC-4096

FC-C

Freeze these

Reinitialize this and train

Transfer Learning

Page 120: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018120

TensorFlow: TensorboardAdd logging to code to record loss, stats, etcRun server and get pretty graphs!

120

Page 121: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018121

TensorFlow: Distributed Version

https://www.tensorflow.org/deploy/distributed

Split one graph over multiple machines!

121

Page 122: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018122

TensorFlow: Tensor Processing Units

Google Cloud TPU = 180 TFLOPs of compute!

Page 123: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018123

TensorFlow: Tensor Processing Units

Google Cloud TPU = 180 TFLOPs of compute!

NVIDIA Tesla V100= 125 TFLOPs of compute

Page 124: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018124

TensorFlow: Tensor Processing Units

Google Cloud TPU = 180 TFLOPs of compute!

NVIDIA Tesla V100= 125 TFLOPs of compute

NVIDIA Tesla P100 = 11 TFLOPs of computeGTX 580 = 0.2 TFLOPs

Page 125: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018125

TensorFlow: Tensor Processing Units

Google Cloud TPU Pod= 64 Cloud TPUs= 11.5 PFLOPs of compute!

Google Cloud TPU = 180 TFLOPs of compute!

https://www.tensorflow.org/versions/master/programmers_guide/using_tpu

Page 126: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018126

Static vs Dynamic GraphsTensorFlow: Build graph once, then run many times (static)

PyTorch: Each forward pass defines a new graph (dynamic)

Build graph

Run each iteration

New graph each iteration

126

Page 127: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018127

Static vs Dynamic: OptimizationWith static graphs, framework can optimize the graph for you before it runs!

ConvReLUConvReLUConvReLU

The graph you wrote

Conv+ReLU

Equivalent graph with fused operations

Conv+ReLUConv+ReLU

127

Page 128: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018128

Static vs Dynamic: Serialization

Once graph is built, can serialize it and run it without the code that built the graph!

Graph building and execution are intertwined, so always need to keep code around

Static Dynamic

128

Page 129: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018129

Static vs Dynamic: Conditional

y = w1 * x if z > 0w2 * x otherwise

129

Page 130: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018130

Static vs Dynamic: Conditional

y = w1 * x if z > 0w2 * x otherwise

PyTorch: Normal Python

130

Page 131: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018131

Static vs Dynamic: Conditional

y = w1 * x if z > 0w2 * x otherwise

PyTorch: Normal Python

TensorFlow: Special TF control flow operator!

131

Page 132: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018132

Static vs Dynamic: Loops

yt = (yt-1+ xt) * wy0

x1 x2 x3

+ * + * +

w

*

132

Page 133: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018133

Static vs Dynamic: Loops

yt = (yt-1+ xt) * wy0

x1 x2 x3

+ * + * +

w

*PyTorch: Normal Python

133

Page 134: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018134

Static vs Dynamic: Loops

yt = (yt-1+ xt) * w

PyTorch: Normal Python

TensorFlow: Special TF control flow

134

Page 135: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018135

Dynamic Graph Applications

Karpathy and Fei-Fei, “Deep Visual-Semantic Alignments for Generating Image Descriptions”, CVPR 2015Figure copyright IEEE, 2015. Reproduced for educational purposes.

135

- Recurrent networks

Page 136: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018136

Dynamic Graph Applications

The cat ate a big rat

136

- Recurrent networks- Recursive networks

Page 137: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018137

Dynamic Graph Applications

- Recurrent networks- Recursive networks- Modular Networks

Andreas et al, “Neural Module Networks”, CVPR 2016Andreas et al, “Learning to Compose Neural Networks for Question Answering”, NAACL 2016Johnson et al, “Inferring and Executing Programs for Visual Reasoning”, ICCV 2017

137

Figure copyright Justin Johnson, 2017. Reproduced with permission.

Page 138: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018138

Dynamic Graph Applications

- Recurrent networks- Recursive networks- Modular Networks- (Your creative idea here)

138

Page 139: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018139

PyTorch vs TensorFlow, Static vs Dynamic

PyTorchDynamic Graphs

139

TensorFlowStatic Graphs

Page 140: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018140

PyTorchDynamic Graphs

140

TensorFlowStatic Graphs

PyTorch vs TensorFlow, Static vs Dynamic

Lines are blurring! PyTorch is adding static features, and TensorFlow is adding dynamic features.

Page 141: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018141

Dynamic TensorFlow: Dynamic Batching

Looks et al, “Deep Learning with Dynamic Computation Graphs”, ICLR 2017https://github.com/tensorflow/fold

TensorFlow Fold make dynamic graphs easier in TensorFlow through dynamic batching

141

Page 142: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018142142

Dynamic TensorFlow: Eager ExecutionTensorFlow 1.7 added eager execution which allows dynamic graphs!

Page 143: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018143143

Dynamic TensorFlow: Eager Execution

Enable eager mode at the start of the program: it’s a global switch

Page 144: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018144144

Dynamic TensorFlow: Eager Execution

These calls to tf.random_normal produce concrete values! No need for placeholders / sessions

Wrap values in a tfe.Variable if we might want to compute grads for them

Page 145: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018145145

Dynamic TensorFlow: Eager Execution

Operations scoped under a GradientTape will build a dynamic graph, similar to PyTorch

Page 146: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018146146

Dynamic TensorFlow: Eager Execution

Use the tape to compute gradients, like .backward() in PyTorch. The print statement works!

Page 147: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018147147

Dynamic TensorFlow: Eager ExecutionEager execution still pretty new, not fully supported in all TensorFlow APIs

Try it out!

Page 148: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018148148

Static PyTorch: Caffe2 https://caffe2.ai/

● Deep learning framework developed by Facebook● Static graphs, somewhat similar to TensorFlow● Core written in C++● Nice Python interface● Can train model in Python, then serialize and deploy

without Python● Works on iOS / Android, etc

Page 149: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018149149

Static PyTorch: ONNX Support

ONNX is an open-source standard for neural network models

Goal: Make it easy to train a network in one framework, then run it in another framework

Supported by PyTorch, Caffe2, Microsoft CNTK, Apache MXNet

https://github.com/onnx/onnx

Page 150: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018150150

Static PyTorch: ONNX SupportYou can export a PyTorch model to ONNX

Run the graph on a dummy input, and save the graph to a file

Will only work if your model doesn’t actually make use of dynamic graph - must build same graph on every forward pass, no loops / conditionals

Page 151: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018151151

Static PyTorch: ONNX Supportgraph(%0 : Float(64, 1000) %1 : Float(100, 1000) %2 : Float(100) %3 : Float(10, 100) %4 : Float(10)) { %5 : Float(64, 100) = onnx::Gemm[alpha=1, beta=1, broadcast=1, transB=1](%0, %1, %2), scope: Sequential/Linear[0] %6 : Float(64, 100) = onnx::Relu(%5), scope: Sequential/ReLU[1] %7 : Float(64, 10) = onnx::Gemm[alpha=1, beta=1, broadcast=1, transB=1](%6, %3, %4), scope: Sequential/Linear[2] return (%7);}

After exporting to ONNX, can run the PyTorch model in Caffe2

Page 152: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018152152

Static PyTorch: Future???

https://github.com/pytorch/pytorch/commit/90afedb6e222d430d5c9333ff27adb42aa4bb900

Page 153: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018153

PyTorch vs TensorFlow, Static vs Dynamic

PyTorchDynamic Graphs

Static: ONNX, Caffe2

153

TensorFlowStatic Graphs

Dynamic: Eager

Page 154: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018154

My Advice:PyTorch is my personal favorite. Clean API, dynamic graphs make it very easy to develop and debug. Can build model in PyTorch then export to Caffe2 with ONNX for production / mobile

TensorFlow is a safe bet for most projects. Not perfect but has huge community, wide usage. Can use same framework for research and production. Probably use a high-level framework. Only choice if you want to run on TPUs.

154

Page 155: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 …cs231n.stanford.edu/slides/2018/cs231n_2018_lecture08.pdf · CPU vs GPU Cores Clock Speed Memory Price Speed CPU (Intel Core

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 26, 2018155

Next Time: CNN Architecture Case Studies

155