TensorFlow 深度學習講座

TensorFlow深度學習講座

By Mark Chang

簡介 •  深度學習是什麼？ •  深度學習的原理 •  Tensorflow是什麼？

深度學習是什麼？

⼈人腦 vs 電腦

8<

:

3x+ 2y + 5z = 75x+ 1y + 8z = 99x+ 4y + 3z = 14

⼈人腦 vs 電腦

貨櫃船機⾞車

⼈人腦 vs 電腦 •  ⼈人腦優勢：

–  影像、聲⾳音 –  語⾔言 –  ⾃自我意識（⾃自決⼒力） –  …

•  電腦優勢： –  數學運算 –  記憶（儲存）能⼒力 –  …

深度學習 •  ⼀一種機器學習的⽅方法 •  ⽤用電腦模擬⼈人腦神經系統構造 •  讓電腦學會⼈人腦可做的事

影像識別

http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

藝術創作

http://arxiv.org/abs/1508.06576

語意理解

https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

詩詞創作

http://emnlp2014.org/papers/pdf/EMNLP2014074.pdf

打電動

http://arxiv.org/pdf/1312.5602v1.pdf

深度學習可以做的事 •  作畫 •  寫詩 •  開⾞車 •  下棋 •  ……

機器學習

監督式學習 Supervised Learning

⾮非監督式學習 Unsupervised Learning

增強式學習 Reinforcement Learning

監督式學習

機器學習模型

機器學習模型 ship

ship

資料

⼈人⼯工標記

資料

答案

⾮非監督式學習

機器學習模型

Beijing is the capital of China. As China's capital, Beijing is a large and vibrant city. Tokyo is the capital of Japan. As Japan’s capital, Tokyo is a large and vibrant city. …….

資料

結果

增強式學習

機器學習模型環境

訊息

動作

機器學習

監督式學習 Supervised Learning

⾮非監督式學習 Unsupervised Learning

增強式學習 Reinforcement Learning

深度學習 Deep Learning

深度學習的原理

監督式機器學習

訓練資料機器學習模型輸出值

正確答案

對答案如果答錯了，要修正模型

機器學習模型測試資料

訓練完成

輸出值

符號慣例

訓練資料

全部：X , Y 單筆：x(i), y(i)

機器學習模型

h 模型參數

w

輸出值

h(X)

正確答案

Y

對答案

E(h(X),Y) 如果答錯了，要修正模型

X

Y

邏輯迴歸（Logistic Regression） •  ⽤用Sigmoid曲線去逼近資料的分佈情形

x

y

x

y

訓練完成

邏輯迴歸（Logistic Regression） •  ⽤用Sigmoid曲線去逼近資料的分佈情形

x

y

訓練資料

X Y -0.47241379 0 -0.35344828 0 -0.30148276 0 0.33448276 1 0.35344828 1 0.37241379 1 0.39137931 1 0.41034483 1 0.44931034 1 0.49827586 1 0.51724138 1

…. ….

機器學習模型

Sigmoid function h(x) =1

1 + e

�(w0+w1x)

w0 + w1x < 0

h(x) ⇡ 0

w0 + w1x > 0

h(x) ⇡ 1

修正模型

•  Error function : Cross Entropy

E(h(X), Y ) =�1

m

(mX

i

y

(i)log(h(x(i))) + (1� y

(i))log(1� h(x(i))))

h(x(i)) ⇡ 0 and y

(i) = 0 ) E(h(X), Y ) ⇡ 0

h(x(i)) ⇡ 1 and y

(i) = 1 ) E(h(X), Y ) ⇡ 0

h(x(i)) ⇡ 0 and y

(i) = 1 ) E(h(X), Y ) ⇡ 1h(x(i)) ⇡ 1 and y

(i) = 0 ) E(h(X), Y ) ⇡ 1

w1 w0

修正模型

•  梯度下降:

w0 w0–⌘

@E(h(X), Y )

@w0

w1 w1–⌘@E(h(X), Y )

@w1

(�@E(h(X), Y )

@w0,�@E(h(X), Y )

@w1)

修正模型

神經元與動作電位

http://humanphisiology.wikispaces.com/file/view/neuron.png/216460814/neuron.png

http://upload.wikimedia.org/wikipedia/commons/thumb/4/4a/Action_potential.svg/1037px-Action_potential.svg.png

模擬神經元

n W1

W2

x1

x2

b Wb

y

n

in

= w1x1 + w2x2 + w

b

n

out

=1

1 + e

�nin

nin

nout

y =1

1 + e�(w1x1+w2x2+wb)

nout

= 1

nout

= 0.5

nout

= 0(0,0)

x2

x1

模擬神經元

n

in

= w1x1 + w2x2 + w

b

n

out

=1

1 + e

�nin

n

in

= w1x1 + w2x2 + w

b

n

out

=1

1 + e

�nin

w1x1 + w2x2 + wb = 0

w1x1 + w2x2 + wb > 0

w1x1 + w2x2 + wb < 0

1

0

⼆二元分類：AND Gate

x1 x2 y

0 0 0

0 1 0

1 0 0

1 1 1 (0,0)

(0,1) (1,1)

(1,0)

0

1

n 20 20

b-30

y x1

x2

y =1

1 + e�(20x1+20x2�30)

20x1 + 20x2 � 30 = 0

XOR Gate ?

(0,0)

(0,1) (1,1)

(1,0)

0

0 1

x1 x2 y

0 0 0

0 1 1

1 0 1

1 1 0

⼆二元分類：XOR Gate

n

-20

20

b

-10

y

(0,0)

(0,1) (1,1)

(1,0)

0 1

(0,0)

(0,1) (1,1)

(1,0)

1 0

(0,0)

(0,1) (1,1)

(1,0) 0

0 1

n1 20 20

b-30

x1

x2

n2 20 20

b-10

x1

x2

x1 x2 n1 n2 y

0 0 0 0 0

0 1 0 1 1

1 0 0 1 1

1 1 1 1 0

類神經網路

x

y

n11

n12

n21

n22 W12,y

W12,x

b

W11,y

W11,b W12,b

b

W11,x W21,11

W22,12

W21,12

W22,11

W21,b W22,b

z1

z2

Input Layer

Hidden Layer

Output Layer

視覺認知

http://www.nature.com/neuro/journal/v8/n8/images/nn0805-975-F1.jpg

訓練類神經網路 •  ⽤用隨機值初始化模型參數w •  Forward Propagation

– ⽤用⺫⽬目前的模型參數計算出答案 •  計算錯誤量（⽤用Error Function） •  Backward Propagation

– ⽤用錯誤量來修正模型

訓練類神經網路

訓練資料機器學習模型輸出值

正確答案

對答案

如果答錯了，要修正模型

初始化 Forward Propagation

Error Function

Backward Propagation

初始化 •  將所有的W隨機設成-N～N之間的數 •  每層之間W的值都不能相同

x

y

n11

n12

n21

n22 W12,y

W12,x

b

W11,y

W11,b W12,b

b

W11,x W21,11

W22,12

W21,12

W22,11

W21,b W22,b

z1

z2

Forward Propagation

Forward Propagation

Error Function

J = �(z1log(n21(out)) + (1� z1)log(1� n21(out)))

� (z2log(n22(out)) + (1� z2)log(1� n22(out)))

n21

n22

z1

z2

nout

⇡ 0 and z = 0 ) J ⇡ 0

nout

⇡ 1 and z = 1 ) J ⇡ 0

nout

⇡ 0 and z = 1 ) J ⇡ 1nout

⇡ 1 and z = 0 ) J ⇡ 1

w1 w0

Gradient Descent

w21,11 w21,11 � ⌘@J

@w21,11

w21,12 w21,12 � ⌘@J

@w21,12

w21,b w21,b � ⌘@J

@w21,b

w22,11 w21,11 � ⌘@J

@w22,11

w22,12 w21,12 � ⌘@J

@w22,12

w22,b w21,b � ⌘@J

@w22,b

w11,x w11,x � ⌘@J

@w11,x

w11,y w11,y � ⌘@J

@w11,y

w11,b w11,b � ⌘@J

@w11,b

w12,x w12,x � ⌘@J

@w12,x

w12,y w12,y � ⌘@J

@w12,y

w12,b w12,b � ⌘@J

@w12,b

(–@J

@w0, –

@J

@w1)


@J

@n21(out)

@n21(out)

@n21(in)

�21(out)

@J

@w21,11=

@n21(in)

@w21,11

=@n21(out)

@n21(in)

@n21(in)

@w21,11

n11(out)

�21(in)@n21(in)

@w21,11

�21(in)

=

=

n11(out)�21(in)

w21,11 w21,11 � ⌘@J

@w21,11

w21,11 w21,11 � ⌘

Backward Propagation �11(in) =

@J

@n11(in)=

@J

@n21(out)

@n21(out)

@n11(in)+

@J

@n22(out)

@n22(out)

@n11(in)

= (�21(in)w21,11 + �22(in)w22,11)@n11(out)

@n11(in)

=@J

@n21(out)

@n21(out)

@n21(in)

@n21(in)

@n11(out)

@n11(out)

@n11(in)+

@J2@n22(out)

@n22(out)

@n22(in)

@n22(in)

@n11(out)

@n11(out)

@n11(in)

= (@J

@n21(out)

@n21(out)

@n21(in)

@n21(in)

@n11(out)+

@J2@n22(out)

@n22(out)

@n22(in)

@n22(in)

@n11(out))@n11(out)

@n11(in)


http://cpmarkchang.logdown.com/posts/277349-neural-network-backward-propagation

Tensorflow是什麼？

Tensorflow

•  https://www.tensorflow.org/ •  TensorFlow 是 Google 開發的開源機器學習⼯工具。 •  透過使⽤用Computational Graph，來進⾏行數值演算。 •  ⽀支援程式語⾔言：python、C++ •  系統需求：

–  作業系統必須為Mac或Linux –  Python 2.7 或 3.3 （含以上）

Computational Graph

Tensorflow

機器學習Library (ex, scikit-learn) TensorFlow 從頭開始寫

彈性

技術門檻

把資料整理好後，剩下的就直接呼叫API

自行定義Computational Graph，並交由TensorFlow計算。

自己推導微分公式，自己寫整個流程

低

低

高

高

Tensorflow •  彈性

– 只要是可以⽤用Computational Graph來表達的運算，都可以⽤用Tensorflow來解。

•  ⾃自動微分 – ⾃自動計算Computational Graph微分後的結果。

•  平台相容性 – 同樣的程式碼可⽤用CPU執⾏行，亦可⽤用GPU執⾏行。

CPU V.S GPU

http://allegroviva.com/gpu-computing/difference-between-gpu-and-cpu/

Example : Binary Classification

n w1

w2

1b

y x1

x2 x1

x2

y

y =1

1 + e�x1w1+x2w2+b

x_data = np.random.rand(50,2) y_data = ((x_data[:,1] > 0.5)*

( x_data[:,0] > 0.5))

模型資料

Example : Binary Classification

n w1

w2

1b

y x1

x2

y =1

1 + e�x1w1+x2w2+b

訓練後

Tensorflow x_ = tf.placeholder(tf.float32,[None,2]) y_ = tf.placeholder(tf.float32,[None,1]) w = tf.Variable(tf.random_uniform([2,1], -1.0, 1.0)) b = tf.Variable(tf.zeros([1,1])) y = tf.nn.sigmoid(tf.matmul(x_,w)+b) cross_entropy = -tf.reduce_sum(y_*tf.log(y) + (1- y_) * tf.log(1 - y) ) optimizer = tf.train.GradientDescentOptimizer(0.1) train = optimizer.minimize(cross_entropy) init = tf.initialize_all_variables() sess = tf.Session() sess.run(init) for step in xrange(500): sess.run(train,feed_dict={x_:x_data,y_:y_data}) print sess.run(cross_entropy) sess.close()

Computational Graph

Session

Computation Graph # placeholder x_ = tf.placeholder(tf.float32,[None,2]) y_ = tf.placeholder(tf.float32,[None,1])

# variable w = tf.Variable(tf.random_uniform([2,1], -1.0, 1.0)) b = tf.Variable(tf.zeros([1,1]))

# operations y = tf.nn.sigmoid(tf.matmul(x_,w)+b)

# error function cross_entropy = -tf.reduce_sum(y_*tf.log(y) + (1- y_) * tf.log(1 - y) )

# trainer optimizer = tf.train.GradientDescentOptimizer(0.1) train = optimizer.minimize(cross_entropy)

# initalizer init = tf.initialize_all_variables()

Placeholder

0.70828883 0.27190551

0.89042455 0.63832092

0.11332515 0.00849676

0.73278006 0.37781084

0.292448 0.09819899

0.9802261 0.94339143

0.36212146 0.54404682

…….. ……..

0!

1!

0!

0!

0!

1!

0!

…!

x_ y_

x_ = tf.placeholder(tf.float32,[None,2]) y_ = tf.placeholder(tf.float32,[None,1])

Variable

w = tf.Variable(tf.random_uniform([2,1], -1.0, 1.0)) b = tf.Variable(tf.zeros([1,1]))

0.42905441 -0.43841863

b

0!

w

Matrix Multiplication y = tf.nn.sigmoid(tf.matmul(x_,w)+b)

0.42905441 -‐0.43841863

w x_

0.70828883 0.27190551

0.89042455 0.63832092

0.11332515 0.00849676

…. ….

0.184686

0.1021888

0.04489752

….

tf.matmul(x_,w)+b

b

0!

0.70828883 * 0.42905441 + 0.27190551 * -‐0.43841863 + 0 0.89042455* 0.42905441 +

0.63832092* -‐0.43841863 + 0 0.11332515* 0.42905441 +

0.00849676* -‐0.43841863 + 0 ….

Sigmoid y = tf.nn.sigmoid(tf.matmul(x_,w)+b)

0.54604071

0.52552499

0.51122249

….

0.184686

0.1021888

0.04489752

….

tf.nn.sigmoid

Error Function

E(h(X), Y ) =�1

m

(mX

i

y

(i)log(h(x(i))) + (1� y

(i))log(1� h(x(i))))

cross_entropy = -tf.reduce_sum(y_*tf.log(y) + (1- y_) * tf.log(1 - y) )

0.54604071

0.52552499

…

0!

1!

…!

y_ y

1.4331052

-tf.reduce_sum(y_*tf.log(y))

Trainer optimizer = tf.train.GradientDescentOptimizer(0.1) train = optimizer.minimize(cross_entropy)

Trainer

w w � ⌘@E(h(X), Y )

@w

b b� ⌘@E(h(X), Y )

@b

Computation Graph •  Initializer

init = tf.initialize_all_variables()

w

b

w = tf.Variable(tf.random_uniform([2,1], -1.0, 1.0)) b = tf.Variable(tf.zeros([1,1]))

0.42905441 -0.43841863

0!

Session # create session sess = tf.Session()

# initialize variable sess.run(init)

# gradient descent for step in xrange(500): sess.run(train, feed_dict={x_:x_data,y_:y_data})

# fetch variable print sess.run(cross_entropy, feed_dict={x_:x_data,y_:y_data})

# release resource sess.close()

Run Operations sess.run(init)

the Node in Computational

Graph

Run Operations for step in xrange(500): sess.run(train, feed_dict={x_:x_data,y_:y_data} )


Graph

Input Data

0.70828883 0.27190551

0.89042455 0.63832092

0.11332515 0.00849676

0.73278006 0.37781084

…….. ……..

0!

1!

0!

0!

…!

x_data y_data

Run Operations print sess.run(cross_entropy, feed_dict={x_:x_data,y_:y_data})


Graph

Input Data

0.70828883 0.27190551

0.89042455 0.63832092

0.11332515 0.00849676

0.73278006 0.37781084

…….. ……..

0!

1!

0!

0!

…!

x_data y_data Results

2.4564333

Training

for step in xrange(500): sess.run(train, feed_dict={x_:x_data,y_:y_data} )

Demo : Binary Classification https://github.com/ckmarkoh/ntc_deeplearning_tensorflow/blob/master/intro/binaryClassification.ipynb

Tensorboard

Histogram Summary

Scalar Summary Computational Graph

summary

tf.scalar_summary tf.histogram_summary

Histogram Summary Scalar Summary

merged = tf.merge_all_summaries() writer = tf.train.SummaryWriter("./", sess.graph_def) for step in xrange(500): …. summary_str = sess.run(merged,feed_dict={x_:x_data,y_:y_data}) writer.add_summary(summary_str, step)

name_scope with tf.name_scope("cross_entropy") as scope: cross_entropy = -tf.reduce_sum(y_*tf.log(y) + (1-y_)*tf.log(1-y))

Launch Tensorboard > tensorboard --logdir=./ Starting TensorBoard on port 6006 (You can navigate to http://0.0.0.0:6006)

Demo : TensorBoard https://github.com/ckmarkoh/ntc_deeplearning_tensorflow/blob/master/intro/tensorboard.py

Demo •  影像識別：GoogLeNet

https://github.com/ckmarkoh/ntc_deeplearning_tensorflow/blob/master/intro/googlenet.ipynb

About the Speaker

•  Email: ckmarkoh at gmail dot com •  Blog: http://cpmarkchang.logdown.com •  Github: https://github.com/ckmarkoh

Mark Chang

•  Facebook: https://www.facebook.com/ckmarkoh.chang •  Slideshare: http://www.slideshare.net/ckmarkohchang •  Linkedin:

https://www.linkedin.com/pub/mark-chang/85/25b/847

77

TensorFlow 深度學習講座

Technology