누구나 TensorFlow! J. Kang Ph.D. 누구나 TensorFlow - Module 4 : Machine Learning with Neural networks Jaewook Kang, Ph.D. [email protected] Soundlly Inc. Sep. 2017 1 © 2017 Jaewook Kang All Rights Reserved
누구나 TensorFlow!J. Kang Ph.D.
누구나 TensorFlow- Module 4 : Machine Learning with Neural networks
Jaewook Kang, [email protected]
Soundlly Inc.
Sep. 2017
1
© 2017Jaewook KangAll Rights Reserved
누구나 TensorFlow!J. Kang Ph.D.
GIST EEC Ph.D. (2015)
신호처리과학자, 삽질러
좋아하는것:
통계적신호처리 / 무선통신신호처리
임베디드오디오 DSP C/C++라이브러리구현
머신러닝기반오디오신호처리알고리즘
배워서남주기
2
대표논문:Jaewook Kang, et al., "Bayesian Hypothesis Test using Nonparametric Belief Propagation for Noisy Sparse Recovery," IEEE Trans. on Signal process., Feb. 2015
Jaewook Kang et al., "Fast Signal Separation of 2D Sparse Mixture via Approximate Message-Passing," IEEE Signal Processing Letters, Nov. 2015
Jaewook Kang (강재욱)
소개
누구나 TensorFlow!J. Kang Ph.D. 3
일 정 목표시간
세부 내 용
Module3
직선으로데이터구분하기Logistic classification
- Introduction to Linear Classification- Naïve Bayes (NB)- Linear Discriminent Analysis (LDA)- Logistic Regression (LR)- NB vs LDA vs LR- LAB5: Linear Classication in TensorFlow
Module
4딥러닝의선조뉴럴네트워크
4
- 뉴런을 수학으로 표현하기- Feed-Forward Neural Networks- Linear 뉴런의 한계와 Activation 함수- Gradient descent Revisit- Backpropagation algorithm- LAB6: Multi-layer neural net with Backpropagation in
TensorFlow
누구나 TensorFlow!J. Kang Ph.D.
GitHub link
GitHub link (all public)– https://github.com/jwkanggist/EveryBodyTensorFlow
Another GitHub link (Not mine)– https://github.com/aymericdamien/TensorFlow-Examples
4
누구나 TensorFlow!J. Kang Ph.D.
1. 딥러닝의 조상, 뉴럴 네트워크
딥러닝을 위해서 한우물을 판 연구자들의 이야기
- 뉴런을 수학으로 표현하기- Feed-Forward Neural Networks- Linear 뉴런의 한계와 Activation 함수- Gradient descent Revisit- Backpropagation algorithm
- LAB6: 2-layer neural net in TensorFlow
5
누구나 TensorFlow!J. Kang Ph.D.
Reference :
6
Fundamental of Deep Learning
1st Edition, 2017 O’Reilly
Nikhil Buduma
누구나 TensorFlow!J. Kang Ph.D.
훌륭한관련한국어블로그진섭님블로그
– https://mathemedicine.github.io/deep_learning.html
솔라리스의인공지능연구실– http://solarisailab.com/archives/1206
테리님의블로그– http://slownews.kr/41461
7
누구나 TensorFlow!J. Kang Ph.D.
The Neuron
뇌의가장기본단위
– 10,000 개이상의뉴런의결합으로뇌가형성
8
이미지출처: http://ib.bioninja.com.au/standard-level/topic-6-human-physiology/65-neurons-and-synapses/neurons.html
누구나 TensorFlow!J. Kang Ph.D.
The Neuron
뇌의가장기본단위
9
신호입력 증폭 결합 전환 신호출력
이미지출처: http://ib.bioninja.com.au/standard-level/topic-6-human-physiology/65-neurons-and-synapses/neurons.html
누구나 TensorFlow!J. Kang Ph.D.
The Neuron
Artificial Neuron (1958)
10
신호입력 증폭 결합 전환 신호출력
Bias, b
이미치출처: https://hackernoon.com/overview-of-artificial-neural-networks-and-its-applications-2525c1addff7
누구나 TensorFlow!J. Kang Ph.D.
The Neuron
Artificial Neuron (1958)
11
신호입력 증폭 결합 전환 신호출력
x2
x3 w3
w2
w1
f (×) y
Bias, b
x1
이미치출처: https://hackernoon.com/overview-of-artificial-neural-networks-and-its-applications-2525c1addff7
누구나 TensorFlow!J. Kang Ph.D.
The Neuron
Artificial Neuron (1958)
12
y = f (Z = XW +b)
누구나 TensorFlow!J. Kang Ph.D.
The Neuron
Artificial Neuron (1958)
13
y = f (Z = XW +b)
Activation
Activationfunction
LogitInput Neuron
weight
Neuronbias
누구나 TensorFlow!J. Kang Ph.D.
The Neuron
뉴런을러닝한다는것
14
x1x2
y
Bias, b
w2w1
누구나 TensorFlow!J. Kang Ph.D.
The Neuron
뉴런을러닝한다는것– 밀당 example: 연애성공하려면밀당의비율을어떻케해야하는가?
– Y : 성공확률
– X: 각행동에드는힘
– W: 성공을위한행동비율
15당
x1x2
y
Bias, b
w2w1
밀
누구나 TensorFlow!J. Kang Ph.D.
The Neuron
뉴런을러닝한다는것– 밀당 example: 연애성공하려면밀당의비율을어떻케해야하는가?
– Linear activation 함수를가정해보자 y= z = f(z)
– Data: t= 1.0, x1= 2.0, x2 = 3.0
– Cost: e = ½ ( t – y)^2, b=0
– Find w1 and w2
16밀 당
x1x2
y
Bias, b
w2w1
누구나 TensorFlow!J. Kang Ph.D.
The Neuron
뉴런을러닝한다는것– 밀당 example: 연애성공하려면밀당의비율을어떻케해야하는가?
– Linear activation 함수를가정해보자 y= z = f(z)
– Data: t= 1.0, x1= 2.0, x2 = 3.0
– Cost: e = ½ ( t – y)^2, b=0
– Find w1 and w2
What’s your answer?
17
Model : y = w1x1 +w2x2
Cost: e =1
2(t - y)2
¶e
¶w1
= -x1(t - y),¶e
¶w2
= -x2 (t - y)
-x1(t -w1x1 -w2x2 ) = 0
-x2 (t -w1x1 -w2x2 ) = 0
ìíî
(w1,w2 ) = ?
누구나 TensorFlow!J. Kang Ph.D.
The Neuron
뉴런을러닝한다는것– (X,Y) 데이터값을 주어서 W,b 값을 찾는것
– 각 입력에 어느정도에 비중을 주어서 결합해야하는지아는것
18밀 당
x1x2
y
Bias, b
? ?
누구나 TensorFlow!J. Kang Ph.D.
Activation Functions
자극(logit, Z) 의 Activation를어떻케모델링할까?
19
x1x2
y
Bias, b
w2w1
?
누구나 TensorFlow!J. Kang Ph.D.
Activation Functions
Sigmoid function– Logit Z를 [0,1]사이로 mapping
– Logit Z를확률값으로 mapping할때사용• Logistic Regression
20
f (z) =1
1+ exp(-z)
Logit Z
누구나 TensorFlow!J. Kang Ph.D.
Activation Functions
Tanh– Logit Z를 [-1,+1]사이로 mapping
– Activation의중심값이 ‘0’이된다.• Multi-layer를쌓을때 hidden layer에서 bias가생기지않는다.
21
f (z) = tanh(z)
Logit Z
f (z) = tanh(z)
누구나 TensorFlow!J. Kang Ph.D.
Activation Functions
ReLU (Restricted Linear unit)– Sigmoid, tanh 함수는입력값이양끝에근접하면기울기가 ‘0’에
가까워짐 Vanishing Gradient문제 (TBU)
22
f (z) = max(0,z)
Logit Z
f (z) = max(0,z)
누구나 TensorFlow!J. Kang Ph.D.
Feed-Forward Neural Networks그런뉴런을엮어서쌓아보자
– 사람의뇌또한계층구조를가지고있다.
23
X = [x1,x2,x3,x4 ]
Y = [y1,y2,y3,y4 ]
규칙:- No connection in the same layer- No backward connection
수식모델링:W1
W2
누구나 TensorFlow!J. Kang Ph.D.
Feed-Forward Neural Networks그런뉴런을엮어서쌓아보자
– 사람의뇌또한계층구조를가지고있다.
24
X = [x1,x2,x3,x4 ]
Y = [y1,y2,y3,y4 ]
규칙:- No connection in the same layer- No backward connection
수식모델링:W1
W2
Y = f (W2 f (W1X +b1)+b2 )
누구나 TensorFlow!J. Kang Ph.D.
Feed-Forward Neural Networks그런뉴런을엮어서쌓아보자
– 사람의뇌또한계층구조를가지고있다.
25
Y = [y1,y2,y3,y4 ]
Input Layer:- 데이터입력 X을받는계층- tf.placeholder()가물리는곳
W1
W2
누구나 TensorFlow!J. Kang Ph.D.
Feed-Forward Neural Networks그런뉴런을엮어서쌓아보자
– 사람의뇌또한계층구조를가지고있다.
26
X = [x1,x2,x3,x4 ]Output Layer:- 데이터출력 Y을내보내는곳- tf.placeholder()가물리는곳
W1
W2
누구나 TensorFlow!J. Kang Ph.D.
Feed-Forward Neural Networks그런뉴런을엮어서쌓아보자
– 사람의뇌또한계층구조를가지고있다.
27
X = [x1,x2,x3,x4 ]
Y = [y1,y2,y3,y4 ]
Hidden Layer:- Input layer와 output layer
사이에있는모든계층- X로부터학습에필요한 feature를스스로뽑아낸다.
- 중간표현단계인 feature map을생성
- Hidden layer가많을수록섬세하게Feature를뽑을수있다.
W1
W2
누구나 TensorFlow!J. Kang Ph.D.
Feed-Forward Neural Networks
Google’s good example site– http://playground.tensorflow.org/
– 가지고놀아보면더이해가잘될것!• Logistic regression (1-layer neural net classification)
• Neural Net
28
누구나 TensorFlow!J. Kang Ph.D.
How to Train Neural Net?어떻케 Neural net를훈련시킬것인가?
– 기존방법1 : Maximum likelihood est. + analytical solution• In many cases, No analytical solution exist
• Non-linearity of activation function No closed form solution
– 기존방법2: Maximum likelihood est. + Numerical solver• An Example: Logistic regression based classification
– Cost: Cross-entropy function (non-linear)
– Solver:
» Gradient descent solvers: cost의큰경사를따라서무족건내려가는것 (first-order method)
» Newton-Raphson solvers: cost의경사가 ’0’ 인지점을찾는것 (second order method, good for convex problems)
29
누구나 TensorFlow!J. Kang Ph.D.
Gradient Descent Revisit
Gradient Descent를다시보자
30
Error
W n+1 =W n -aÑJ(W n )
J(W ) : Error cost
누구나 TensorFlow!J. Kang Ph.D.
Gradient Descent Revisit
Gradient Descent를다시보자
두가지만기억하세요!!
– 기울기방향찾기: The delta rule
– 기울기보폭찾기: learning rate
31
W n+1 =W n -aÑJ(W n )J(W ) : Error cost기울기보폭
Learning rate 기울기방향Gradient
누구나 TensorFlow!J. Kang Ph.D.
Gradient Descent Revisit
Gradient Descent를다시보자– 기울기방향찾기: The delta rule
– W의각성분방향으로얼마나가야하는가?• 각 weight로 error cost 편미분한다.
• sum-of-square cost + linear activation인경우
32
-ÑJ(W ) = [Dw1,Dw2,...,DwM ]
Dwk = -¶J(W )
¶wk= -
¶
¶wk
1
2(t (i ) - y(i) )2
i
åæ
èçö
ø÷
= xk(i )(t (i ) - y(i ) )
i
å
누구나 TensorFlow!J. Kang Ph.D.
Gradient Descent Revisit
어떻케 Neural net를훈련시킬것인가?– 기울기방향찾기: The delta rule
– W의각성분방향으로얼마나가야하는가?• 각 weight로 error cost 편미분한다.
• cross-entropy cost + sigmoid activation인경우
33
-ÑJ(W ) = [Dw1,Dw2,...,DwM ]
누구나 TensorFlow!J. Kang Ph.D.
Gradient Descent Revisit
어떻케 Neural net를훈련시킬것인가?– 기울기보폭찾기: learning rate
• 너무크면발산
• 너무작으면평생걸림 + 연산량증가
34
누구나 TensorFlow!J. Kang Ph.D.
Neural Net의 training기존 ML est. + Gradient Descent의한계
– Hidden Layer 수가늘어남에따라서학습해야할파라메터 W의차원이매우늘어난다.
– “ML est + numerical solvers” 조합으로학습하기에는 unknown파라미터(W) 의개수가너무많다.
• 복잡도가매우늘어난다.
35
Neural Networks Deep Neural Networks
Input Hidden Output Input Hidden Hidden Hidden Output
W1W2 W3 W4W1 W2
누구나 TensorFlow!J. Kang Ph.D.
Error Back Propagation기본철학:
– 이전Layer의 error derivative를전파하여현재Layer의error derivative를계산한다.
36
누구나 TensorFlow!J. Kang Ph.D.
Error Back Propagation기본알고리즘
– STEP1) Initialization of all weights
– STEP2) Forward Propagation: Activation 예측값 y 계산 from input X
– STEP3) Error Back Propagation: Error derivative로부터 weight변화율(Δ𝑤) 계산
– STEP4) Update all weights and go to STEP2
37이미지출처: Bishop’s book Chap 5
누구나 TensorFlow!J. Kang Ph.D.
Error Back Propagation Toy example: A two-layer small neural network
STEP 1) Initialization of all weights
– Cross-entropy cost
– Sigmoid activation
38
x1
x2
t1
t2
z11 y11
z12 y12
z21 y21
z22 y22
w11
w12
w13
w14
w21
w22
w23
w24
InputLayer
HiddenLayer
OutputLayer
b b
W1 W2
f (z) =1
1+ exp(-z)
For sigmoid activation, y = f (z)
¶y
dz= y(1- y)
누구나 TensorFlow!J. Kang Ph.D.
Error Back Propagation Toy example: A two-layer small neural network
STEP 1) Initialization of all weights
– In a random manner:
39
x1
x2
t1
t2
z11 y11
z12 y12
z21 y21
z22 y22
w11
w12
w13
w14
w21
w22
w23
w24
InputLayer
HiddenLayer
OutputLayer
Init. of weights
W1 =w11 w12
w13 w14
é
ë
êê
ù
û
úú
=0.15 0.20
0.25 0.30
é
ëê
ù
ûú
W2 =w21 w22
w23 w24
é
ë
êê
ù
û
úú
=0.40 0.45
0.50 0.55
é
ëê
ù
ûú
b b
Training Data:
X =0.05
0.10
é
ëê
ù
ûú,T =
0.01
0.99
é
ëê
ù
ûú
Bias:
b1=0.35
0.35
é
ëê
ù
ûú,b1=
0.60
0.60
é
ëê
ù
ûú
W1 W2
누구나 TensorFlow!J. Kang Ph.D.
Error Back Propagation Toy example: A two-layer small neural network
STEP 2) Forward Propagation
40
x1
x2
t1
t2
z11 y11
z12 y12
z21 y21
z22 y22
w11
w12
w13
w14
w21
w22
w23
w24
InputLayer
HiddenLayer
OutputLayerW1 W2
Y1 = f Z1 =W1X( ) = f0.15 0.20
0.25 0.30
é
ëê
ù
ûú
0.05
0.10
é
ëê
ù
ûú+
0.35
0.35
é
ëê
ù
ûú
æ
èç
ö
ø÷ =
0.5933
0.5969
é
ëê
ù
ûú
Y2 = f Z2 =W2Y1( ) = f0.40 0.45
0.50 0.55
é
ëê
ù
ûú
0.5933
0.5969
é
ëê
ù
ûú+
0.6
0.6
é
ëê
ù
ûú
æ
èç
ö
ø÷ =
0.7514
0.7729
é
ëê
ù
ûú
b b
누구나 TensorFlow!J. Kang Ph.D.
Error Back Propagation Toy example: A two-layer small neural network
STEP 3) Error Back Propagation for W2
– 3-1: calculate error derivative wrt y21, y22
41
x1
x2
t1
t2
z11 y11
z12 y12
z21 y21
z22 y22
w11
w12
w13
w14
w21
w22
w23
w24
InputLayer
HiddenLayer
OutputLayerW1 W2
b b
¶J(W2 )
¶y21
=y21 - t1
y21(1- y21),¶J(W2 )
¶y22
=y22 - t2
y22 (1- y22 )
누구나 TensorFlow!J. Kang Ph.D.
Error Back Propagation Toy example: A two-layer small neural network
STEP 3) Error Back Propagation for W2
– 3-2: calculate error derivative wrt W2
42
x1
x2
t1
t2
z11 y11
z12 y12
z21 y21
z22 y22
w11
w12
w13
w14
w21
w22
w23
w24
InputLayer
HiddenLayer
OutputLayerW1
b b
Dw21 = -a¶J(W2 )
¶y21
¶y21
dz21
dz21
¶w21
= a (t1 - y21)y11,
Dw22 = -a¶J(W2 )
¶y21
¶y21
dz21
dz21
¶w22
= a (t1 - y21)y12 ,
Dw23 = -a¶J(W2 )
¶y22
¶y22
dz22
dz22
¶w23
= a (t2 - y22 )y11,
Dw24 = -a¶J(W2 )
¶y22
¶y22
dz22
dz22
¶w24
= a (t2 - y22 )y12
누구나 TensorFlow!J. Kang Ph.D.
Error Back Propagation Toy example: A two-layer small neural network
STEP 3) Error Back Propagation for W1 (Important!!)
– 3-3: calculate error derivative wrt y11, y12
43
x1
x2
t1
t2
z11 y11
z12 y12
z21 y21
z22 y22
w11
w12
w13
w14
w21
w22
w23
w24
InputLayer
HiddenLayer
OutputLayerW1 W2
b b
¶J(W1)
¶y11
= w21
¶J(W2 )
¶y21
¶y21
dz21
+w23
¶J(W2 )
¶y22
¶y22
dz22
¶J(W1)
¶y12
= w22
¶J(W2 )
¶y21
¶y21
dz21
+w24
¶J(W2 )
¶y22
¶y22
dz22
누구나 TensorFlow!J. Kang Ph.D.
Error Back Propagation Toy example: A two-layer small neural network
STEP 3) Error Back Propagation for W1 (Important!!)
– 3-3: calculate error derivative wrt y11, y12
44
x1
x2
t1
t2
z11 y11
z12 y12
z21 y21
z22 y22
w11
w12
w13
w14
w21
w22
w23
w24
InputLayer
HiddenLayer
OutputLayerW1 W2
b b
¶J(W1)
¶y11
= w21
¶J(W2 )
¶y21
y21(1- y21)+w23
¶J(W2 )
¶y22
y22(1- y22 )
¶J(W1)
¶y12
= w22
¶J(W2 )
¶y21
y21(1- y21)+w24
¶J(W2 )
¶y22
y22(1- y22 )
누구나 TensorFlow!J. Kang Ph.D.
Error Back Propagation Toy example: A two-layer small neural network
STEP 3) Error Back Propagation for W1
– 3-4: calculate error derivative wrt W1
45
x1
x2
t1
t2
z11 y11
z12 y12
z21 y21
z22 y22
w11
w12
w13
w14
w21
w22
w23
w24
InputLayer
HiddenLayer
OutputLayerW2
b b
Dw11 = -a¶J(W1)
¶y11
¶y11
dz11
dz11
¶w11
= -a¶J(W1)
¶y11
y11(1- y11)x1 ,
Dw12 = -a¶J(W1)
¶y11
¶y11
dz11
dz11
¶w12
= -a¶J(W1)
¶y11
y11(1- y11)x2 ,
Dw13 = -a¶J(W1)
¶y12
¶y12
dz12
dz12
¶w13
= -a¶J(W1)
¶y12
y12(1- y12 )x1 ,
Dw14 = -a¶J(W1)
¶y12
¶y12
dz12
dz12
¶w14
= -a¶J(W1)
¶y12
y12(1- y12 )x2
누구나 TensorFlow!J. Kang Ph.D.
Error Back Propagation Toy example: A two-layer small neural network
STEP 4) update all the weights and goto STEP 2
Iterate forward propagation and error back propagation
46
x1
x2
t1
t2
z11 y11
z12 y12
z21 y21
z22 y22
w11
w12
w13
w14
w21
w22
w23
w24
InputLayer
HiddenLayer
OutputLayerW1 W2
b b
누구나 TensorFlow!J. Kang Ph.D.
LAB6: Multi-layer neural net in TensorFlow
Cluster in Cluster data https://github.com/jwkanggist/EveryBodyTensorFlow/blo
b/master/lab6_runTFMultiANN_clusterinclusterdata.py
47
누구나 TensorFlow!J. Kang Ph.D.
LAB6: Multi-layer neural net in TensorFlow
Two spiral data– https://github.com/jwkanggist/EveryBodyTensorFlow/blob
/master/lab6_runTFMultiANN_spiraldata.py
48