Top Banner
TensorFlow and Deep Learning without a PhD Lucio Floretta CODEMOTION MILAN - SPECIAL EDITION 10 – 11 NOVEMBER 2017
41

Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

Nov 24, 2017

Download

Technology

Codemotion
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

TensorFlow and Deep Learning without a PhDLucio Floretta

CODEMOTION MILAN - SPECIAL EDITION 10 – 11 NOVEMBER 2017

Page 2: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

?MNIST = Mixed National Institute of Standards and Technology - Download the dataset at http://yann.lecun.com/exdb/mnist/

Hello World: handwritten digits classification - MNIST

Page 3: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

Very simple model: softmax classification

28x28 pixels

softmax

...

...

0 1 2 9

Logit := weighted sum of all pixels + bias

neuron outputs

784 pixels

Page 4: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

+ b0 b

1 b

2 b

3 … b

9L0,0

L0,1

L0,2

L0,3

… L0,9

L0,0

L1,0

L1,1

L1,2

L1,3

… L1,9

L2,0

L2,1

L2,2

L2,3

… L2,9

L3,0

L3,1

L3,2

L3,3

… L3,9

…L99,0

L99,1

L99,2

… L99,9

xxxxxxxx

w0,0

w0,1

w0,2

w0,3

… w0,9

w1,0

w1,1

w1,2

w1,3

… w1,9

w2,0

w2,1

w2,2

w2,3

… w2,9

w3,0

w3,1

w3,2

w3,3

… w3,9

w4,0

w4,1

w4,2

w4,3

… w4,9

w5,0

w5,1

w5,2

w5,3

… w5,9

w6,0

w6,1

w6,2

w6,3

… w6,9

w7,0

w7,1

w7,2

w7,3

… w7,9

w8,0

w8,1

w8,2

w8,3

… w8,9

…w783,0

w783,1

w783,2

… w783,9

10 columns

784 lines

broadcast

In matrix notation, 100 images at a time

784 pixels

X : 100 images,one per line, flattened x

x

+ Same 10 biases on all lines

Page 5: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

Softmax, on a batch of images

Predictions Images Weights Biases

Y[100, 10] X[100, 784] W[784,10] b[10]

matrix multiply broadcast on all lines

applied line by line

tensor shapes in [ ]

Page 6: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

Now in TensorFlow (Python)

Y = tf.nn.softmax(tf.matmul(X, W) + b)

tensor shapes: X[100, 784] W[748,10] b[10]

matrix multiply broadcast on all lines

Page 7: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

Success?

Cross entropy:

Y := computed probabilities

Y_ := actual probabilities, “one-hot” encoded0 0 0 0 0 0 1 0 0 0

0 1 2 3 4 5 6 7 8 9

0.02 0.01 0.01 0.11 0.02 0.01 0.78 0.01 0.01 0.01

0 1 2 3 4 5 6 7 8 9this is a “6”

Page 9: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

92%

Page 10: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

TensorFlow - initialisation

import tensorflow as tf

X = tf.placeholder(tf.float32, [None, 28, 28, 1])W = tf.Variable(tf.zeros([784, 10]))b = tf.Variable(tf.zeros([10]))

init = tf.initialize_all_variables()

this will become the batch size

28 x 28 grayscale images

Training = computing variables W and b

Page 11: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

# modelY = tf.nn.softmax(tf.matmul(tf.reshape(X, [-1, 784]), W) + b)# placeholder for correct answersY_ = tf.placeholder(tf.float32, [None, 10])

# loss function

cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y))

TensorFlow - success metrics

“one-hot” encoded

flattening images

Page 12: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

TensorFlow - training

optimizer = tf.train.GradientDescentOptimizer(0.005)train_step = optimizer.minimize(cross_entropy)

learning rate

loss function

Page 13: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

sess = tf.Session()sess.run(init)

for i in range(1000):# load batch of images and correct answersbatch_X, batch_Y = mnist.train.next_batch(100)train_data={X: batch_X, Y_: batch_Y}

# trainsess.run(train_step, feed_dict=train_data)

TensorFlow - run !

running a Tensorflow computation, feeding placeholders

Page 14: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

import tensorflow as tf

X = tf.placeholder(tf.float32, [None, 28, 28, 1])W = tf.Variable(tf.zeros([784, 10]))b = tf.Variable(tf.zeros([10]))init = tf.initialize_all_variables()

# modelY=tf.nn.softmax(tf.matmul(tf.reshape(X,[-1, 784]), W) + b)

# placeholder for correct answersY_ = tf.placeholder(tf.float32, [None, 10])

# loss functioncross_entropy = -tf.reduce_sum(Y_ * tf.log(Y))

TensorFlow - full python code

optimizer = tf.train.GradientDescentOptimizer(0.005)train_step = optimizer.minimize(cross_entropy)

sess = tf.Session()sess.run(init)

for i in range(10000):# load batch of images and correct answersbatch_X, batch_Y = mnist.train.next_batch(100)train_data={X: batch_X, Y_: batch_Y}

# trainsess.run(train_step, feed_dict=train_data)

initialisation

model

success metrics

training step

Run

Page 15: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

92%

Page 16: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

|

|Go deep !|

|

Page 17: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

Let’s try 5 fully-connected layers !

overkill

;-)

9...0 1 2

REctifiedLinearUnit

softmax

200

100

60

10

30

784

Page 18: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

TensorFlow - initialisation

K = 200

L = 100

M = 60

N = 30

W1 = tf.Variable(tf.truncated_normal([28*28, K] ,stddev=0.1))

B1 = tf.Variable(tf.zeros([K]))

W2 = tf.Variable(tf.truncated_normal([K, L], stddev=0.1))

B2 = tf.Variable(tf.zeros([L]))

W3 = tf.Variable(tf.truncated_normal([L, M], stddev=0.1))

B3 = tf.Variable(tf.zeros([M]))

W4 = tf.Variable(tf.truncated_normal([M, N], stddev=0.1))

B4 = tf.Variable(tf.zeros([N]))

W5 = tf.Variable(tf.truncated_normal([N, 10], stddev=0.1))

B5 = tf.Variable(tf.zeros([10]))

weights initialised with random values

Page 19: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

TensorFlow - the model

X = tf.reshape(X, [-1, 28*28])

Y1 = tf.nn.relu(tf.matmul(X, W1) + B1)

Y2 = tf.nn.relu(tf.matmul(Y1, W2) + B2)

Y3 = tf.nn.relu(tf.matmul(Y2, W3) + B3)

Y4 = tf.nn.relu(tf.matmul(Y3, W4) + B4)

Y = tf.nn.softmax(tf.matmul(Y4, W5) + B5)

weights and biases

Page 20: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

98%

Page 21: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

Noisy accuracy curve ?

yuck!

Page 22: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

Slow down . . . Learning

rate decay

Page 23: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

98%

Overfitting

Page 24: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

Dropout

Page 25: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

Dropout

TRAININGpkeep=0.75

EVALUATION pkeep=1

pkeep = tf.placeholder(tf.float32)

Yf = tf.nn.relu(tf.matmul(X, W) + B)

Y = tf.nn.dropout(Yf, pkeep)

Page 26: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

98%

Page 27: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

All the party tricks

Learning rate 0.005Decaying learning rate 0.005 -> 0.0001Decaying learning rate 0.005 -> 0.0001 and dropout 0.75

Page 28: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

Too many neurons

Overfitting ?!?

Not enough DATA

BAD Network

Page 29: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

W1[4, 4, 3]

W2[4, 4, 3]

+padding

W[4, 4, 3, 2]

filter size

input channels

output channels

stride

convolutionalsubsampling

convolutionalsubsampling

convolutionalsubsampling

Convolutional layer

Page 30: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

Hacker’s tip

ALLConvolu-tional

Page 31: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

Convolutional neural network

convolutional layer, 4 channelsW1[5, 5, 1, 4] stride 1

convolutional layer, 8 channelsW2[4, 4, 4, 8] stride 2

convolutional layer, 12 channelsW3[4, 4, 8, 12] stride 2

28x28x1

28x28x4

14x14x8

200

7x7x12

10fully connected layer W4[7x7x12, 200]softmax readout layer W5[200, 10]

+ biases on all layers

Page 32: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

Tensorflow - initialisation

W1 = tf.Variable(tf.truncated_normal([5, 5, 1, 4] ,stddev=0.1))B1 = tf.Variable(tf.ones([4])/10)

W2 = tf.Variable(tf.truncated_normal([5, 5, 4, 8] ,stddev=0.1))

B2 = tf.Variable(tf.ones([8])/10)

W3 = tf.Variable(tf.truncated_normal([4, 4, 8, 12] ,stddev=0.1))

B3 = tf.Variable(tf.ones([12])/10)

W4 = tf.Variable(tf.truncated_normal([7*7*12, 200] ,stddev=0.1))

B4 = tf.Variable(tf.ones([200])/10)

W5 = tf.Variable(tf.truncated_normal([200, 10] ,stddev=0.1))

B5 = tf.Variable(tf.zeros([10])/10)

filter size

input channels

output channels

weights initialised with random values

Page 33: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

Tensorflow - the model

Y1 = tf.nn.relu(tf.nn.conv2d(X, W1, strides=[1, 1, 1, 1], padding='SAME') + B1)

Y2 = tf.nn.relu(tf.nn.conv2d(Y1, W2, strides=[1, 2, 2, 1], padding='SAME') + B2)

Y3 = tf.nn.relu(tf.nn.conv2d(Y2, W3, strides=[1, 2, 2, 1], padding='SAME') + B3)

YY = tf.reshape(Y3, shape=[-1, 7 * 7 * 12])

Y4 = tf.nn.relu(tf.matmul(YY, W4) + B4)

Y = tf.nn.softmax(tf.matmul(Y4, W5) + B5)

weights biasesstride

flatten all values for fully connected layer

input image batchX[100, 28, 28, 1]

Y3 [100, 7, 7, 12]

YY [100, 7x7x12]

Page 34: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

99.1%

Page 35: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

WTFH ???

???

Page 36: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

Bigger convolutional network + dropout

convolutional layer, 12 channelsW2[5, 5, 6, 12] stride 2convolutional layer, 12 channelsW2[5, 5, 6, 12] stride 2

convolutional layer, 6 channelsW1[6, 6, 1, 6] stride 1

convolutional layer, 24 channelsW3[4, 4, 12, 24] stride 2convolutional layer, 24 channelsW3[4, 4, 12, 24] stride 2

convolutional layer, 6 channelsW1[6, 6, 1, 6] stride 1

28x28x1

28x28x6

14x14x12

200

7x7x24

10fully connected layer W4[7x7x24, 200]softmax readout layer W5[200, 10]

+ biases on all layers

+DROPOUT p=0.75

Page 37: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

99.3%

Page 38: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

YEAH !

with dropout

Page 39: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

Martin GörnerGoogle Developer relations

@martin_gorner

plus.google.com/+MartinGorner

goo.gl/pHeXe7

youtu.be/qyvlt7kiQoI

goo.gl/mVZloU

github.com/martin-gorner/tensorflow-mnist-tutorial

goo.gl/UuN41S

youtu.be/vq2nnJ4g6N0

github.com/martin-gorner/tensorflow-rnn-shakespeare

Where to go next

Cartoon images copyright: alexpokusay / 123RF stock photos

Page 40: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

TensorFlow : ML for Everyone

It scales from research to production It supports many platforms

Internal launch

Search, Gmail, Translate, Maps, Android, Photos, Speech, YouTube, Play and many others

+100s of research projects and papers

iOSAndroid

TPU

GPU

CPU Compute Engine

Cloud Machine Learning Engine

Cloud Vision API

Cloud Speech API

Natural Language API

Google Translate API

Video Intelligence API

Cloud Jobs API

PRIVATE BETA

PRIVATE ALPHA

Page 41: Lucio Floretta - TensorFlow and Deep Learning without a PhD - Codemotion Milan 2017

Thank You.