[Tf2017] day4 jwkang_pub

누구나 TensorFlow!J. Kang Ph.D.

누구나 TensorFlow- Module 4 : Machine Learning with Neural networks

Jaewook Kang, [email protected]

Soundlly Inc.

Sep. 2017

1

© 2017Jaewook KangAll Rights Reserved

mailto:[email protected]


GIST EEC Ph.D. (2015)

신호처리과학자, 삽질러

좋아하는것:

통계적신호처리 / 무선통신신호처리

임베디드오디오 DSP C/C++라이브러리구현

머신러닝기반오디오신호처리알고리즘

배워서남주기

2

대표논문:Jaewook Kang, et al., "Bayesian Hypothesis Test using Nonparametric Belief Propagation for Noisy Sparse Recovery," IEEE Trans. on Signal process., Feb. 2015

Jaewook Kang et al., "Fast Signal Separation of 2D Sparse Mixture via Approximate Message-Passing," IEEE Signal Processing Letters, Nov. 2015

Jaewook Kang (강재욱)

소개

누구나 TensorFlow!J. Kang Ph.D. 3

일 정 목표시간

세부 내 용

Module3

직선으로데이터구분하기Logistic classification

- Introduction to Linear Classification- Naïve Bayes (NB)- Linear Discriminent Analysis (LDA)- Logistic Regression (LR)- NB vs LDA vs LR- LAB5: Linear Classication in TensorFlow

Module

4딥러닝의선조뉴럴네트워크

4

- 뉴런을 수학으로 표현하기- Feed-Forward Neural Networks- Linear 뉴런의 한계와 Activation 함수- Gradient descent Revisit- Backpropagation algorithm- LAB6: Multi-layer neural net with Backpropagation in

TensorFlow


GitHub link

GitHub link (all public)– https://github.com/jwkanggist/EveryBodyTensorFlow

Another GitHub link (Not mine)– https://github.com/aymericdamien/TensorFlow-Examples

4

https://github.com/jwkanggist/EveryBodyTensorFlow

https://github.com/aymericdamien/TensorFlow-Examples


1. 딥러닝의 조상, 뉴럴 네트워크

딥러닝을 위해서 한우물을 판 연구자들의 이야기

- 뉴런을 수학으로 표현하기- Feed-Forward Neural Networks- Linear 뉴런의 한계와 Activation 함수- Gradient descent Revisit- Backpropagation algorithm

- LAB6: 2-layer neural net in TensorFlow

5


Reference :

6

Fundamental of Deep Learning

1st Edition, 2017 O’Reilly

Nikhil Buduma


훌륭한관련한국어블로그진섭님블로그

– https://mathemedicine.github.io/deep_learning.html

솔라리스의인공지능연구실– http://solarisailab.com/archives/1206

테리님의블로그– http://slownews.kr/41461

7

https://mathemedicine.github.io/deep_learning.html

http://solarisailab.com/archives/1206

http://slownews.kr/41461


The Neuron

뇌의가장기본단위

– 10,000 개이상의뉴런의결합으로뇌가형성

8

이미지출처: http://ib.bioninja.com.au/standard-level/topic-6-human-physiology/65-neurons-and-synapses/neurons.html

http://ib.bioninja.com.au/standard-level/topic-6-human-physiology/65-neurons-and-synapses/neurons.html


The Neuron

뇌의가장기본단위

9

신호입력 증폭 결합 전환 신호출력

이미지출처: http://ib.bioninja.com.au/standard-level/topic-6-human-physiology/65-neurons-and-synapses/neurons.html

http://ib.bioninja.com.au/standard-level/topic-6-human-physiology/65-neurons-and-synapses/neurons.html


The Neuron

Artificial Neuron (1958)

10


Bias, b

이미치출처: https://hackernoon.com/overview-of-artificial-neural-networks-and-its-applications-2525c1addff7

https://hackernoon.com/overview-of-artificial-neural-networks-and-its-applications-2525c1addff7


The Neuron


11


x2

x3 w3

w2

w1

f (×) y

Bias, b

x1

이미치출처: https://hackernoon.com/overview-of-artificial-neural-networks-and-its-applications-2525c1addff7

https://hackernoon.com/overview-of-artificial-neural-networks-and-its-applications-2525c1addff7


The Neuron


12

y = f (Z = XW +b)


The Neuron


13

y = f (Z = XW +b)

Activation

Activationfunction

LogitInput Neuron

weight

Neuronbias


The Neuron

뉴런을러닝한다는것

14

x1x2

y

Bias, b

w2w1


The Neuron

뉴런을러닝한다는것– 밀당 example: 연애성공하려면밀당의비율을어떻케해야하는가?

– Y : 성공확률

– X: 각행동에드는힘

– W: 성공을위한행동비율

15당

x1x2

y

Bias, b

w2w1

밀


The Neuron


– Linear activation 함수를가정해보자 y= z = f(z)

– Data: t= 1.0, x1= 2.0, x2 = 3.0

– Cost: e = ½ ( t – y)^2, b=0

– Find w1 and w2

16밀 당

x1x2

y

Bias, b

w2w1


The Neuron


– Linear activation 함수를가정해보자 y= z = f(z)

– Data: t= 1.0, x1= 2.0, x2 = 3.0

– Cost: e = ½ ( t – y)^2, b=0

– Find w1 and w2

What’s your answer?

17

Model : y = w1x1 +w2x2

Cost: e =1

2(t - y)2

¶e

¶w1

= -x1(t - y),¶e

¶w2

= -x2 (t - y)

-x1(t -w1x1 -w2x2 ) = 0

-x2 (t -w1x1 -w2x2 ) = 0

ìíî

(w1,w2 ) = ?


The Neuron

뉴런을러닝한다는것– (X,Y) 데이터값을 주어서 W,b 값을 찾는것

– 각 입력에 어느정도에 비중을 주어서 결합해야하는지아는것

18밀 당

x1x2

y

Bias, b

? ?


Activation Functions

자극(logit, Z) 의 Activation를어떻케모델링할까?

19

x1x2

y

Bias, b

w2w1

?



Sigmoid function– Logit Z를 [0,1]사이로 mapping

– Logit Z를확률값으로 mapping할때사용• Logistic Regression

20

f (z) =1

1+ exp(-z)

Logit Z



Tanh– Logit Z를 [-1,+1]사이로 mapping

– Activation의중심값이 ‘0’이된다.• Multi-layer를쌓을때 hidden layer에서 bias가생기지않는다.

21

f (z) = tanh(z)

Logit Z

f (z) = tanh(z)



ReLU (Restricted Linear unit)– Sigmoid, tanh 함수는입력값이양끝에근접하면기울기가 ‘0’에

가까워짐 Vanishing Gradient문제 (TBU)

22

f (z) = max(0,z)

Logit Z

f (z) = max(0,z)


Feed-Forward Neural Networks그런뉴런을엮어서쌓아보자

– 사람의뇌또한계층구조를가지고있다.

23

X = [x1,x2,x3,x4 ]

Y = [y1,y2,y3,y4 ]

규칙:- No connection in the same layer- No backward connection

수식모델링:W1

W2




24

X = [x1,x2,x3,x4 ]

Y = [y1,y2,y3,y4 ]

규칙:- No connection in the same layer- No backward connection

수식모델링:W1

W2

Y = f (W2 f (W1X +b1)+b2 )




25

Y = [y1,y2,y3,y4 ]

Input Layer:- 데이터입력 X을받는계층- tf.placeholder()가물리는곳

W1

W2




26

X = [x1,x2,x3,x4 ]Output Layer:- 데이터출력 Y을내보내는곳- tf.placeholder()가물리는곳

W1

W2




27

X = [x1,x2,x3,x4 ]

Y = [y1,y2,y3,y4 ]

Hidden Layer:- Input layer와 output layer

사이에있는모든계층- X로부터학습에필요한 feature를스스로뽑아낸다.

- 중간표현단계인 feature map을생성

- Hidden layer가많을수록섬세하게Feature를뽑을수있다.

W1

W2


Feed-Forward Neural Networks

Google’s good example site– http://playground.tensorflow.org/

– 가지고놀아보면더이해가잘될것!• Logistic regression (1-layer neural net classification)

• Neural Net

28

http://playground.tensorflow.org/


How to Train Neural Net?어떻케 Neural net를훈련시킬것인가?

– 기존방법1 : Maximum likelihood est. + analytical solution• In many cases, No analytical solution exist

• Non-linearity of activation function No closed form solution

– 기존방법2: Maximum likelihood est. + Numerical solver• An Example: Logistic regression based classification

– Cost: Cross-entropy function (non-linear)

– Solver:

» Gradient descent solvers: cost의큰경사를따라서무족건내려가는것 (first-order method)

» Newton-Raphson solvers: cost의경사가 ’0’ 인지점을찾는것 (second order method, good for convex problems)

29


Gradient Descent Revisit

Gradient Descent를다시보자

30

Error

W n+1 =W n -aÑJ(W n )

J(W ) : Error cost



Gradient Descent를다시보자

두가지만기억하세요!!

– 기울기방향찾기: The delta rule

– 기울기보폭찾기: learning rate

31

W n+1 =W n -aÑJ(W n )J(W ) : Error cost기울기보폭

Learning rate 기울기방향Gradient



Gradient Descent를다시보자– 기울기방향찾기: The delta rule

– W의각성분방향으로얼마나가야하는가?• 각 weight로 error cost 편미분한다.

• sum-of-square cost + linear activation인경우

32

-ÑJ(W ) = [Dw1,Dw2,...,DwM ]

Dwk = -¶J(W )

¶wk= -

¶

¶wk

1

2(t (i ) - y(i) )2

i

åæ

èçö

ø÷

= xk(i )(t (i ) - y(i ) )

i

å



어떻케 Neural net를훈련시킬것인가?– 기울기방향찾기: The delta rule

– W의각성분방향으로얼마나가야하는가?• 각 weight로 error cost 편미분한다.

• cross-entropy cost + sigmoid activation인경우

33

-ÑJ(W ) = [Dw1,Dw2,...,DwM ]

https://www.ics.uci.edu/~pjsadows/notes.pdf



어떻케 Neural net를훈련시킬것인가?– 기울기보폭찾기: learning rate

• 너무크면발산

• 너무작으면평생걸림 + 연산량증가

34


Neural Net의 training기존 ML est. + Gradient Descent의한계

– Hidden Layer 수가늘어남에따라서학습해야할파라메터 W의차원이매우늘어난다.

– “ML est + numerical solvers” 조합으로학습하기에는 unknown파라미터(W) 의개수가너무많다.

• 복잡도가매우늘어난다.

35

Neural Networks Deep Neural Networks

Input Hidden Output Input Hidden Hidden Hidden Output

W1W2 W3 W4W1 W2


Error Back Propagation기본철학:

– 이전Layer의 error derivative를전파하여현재Layer의error derivative를계산한다.

36


Error Back Propagation기본알고리즘

– STEP1) Initialization of all weights

– STEP2) Forward Propagation: Activation 예측값 y 계산 from input X

– STEP3) Error Back Propagation: Error derivative로부터 weight변화율(Δ𝑤) 계산

– STEP4) Update all weights and go to STEP2

37이미지출처: Bishop’s book Chap 5


Error Back Propagation Toy example: A two-layer small neural network

STEP 1) Initialization of all weights

– Cross-entropy cost

– Sigmoid activation

38

x1

x2

t1

t2

z11 y11

z12 y12

z21 y21

z22 y22

w11

w12

w13

w14

w21

w22

w23

w24

InputLayer

HiddenLayer

OutputLayer

b b

W1 W2

f (z) =1

1+ exp(-z)

For sigmoid activation, y = f (z)

¶y

dz= y(1- y)



STEP 1) Initialization of all weights

– In a random manner:

39

x1

x2

t1

t2

z11 y11

z12 y12

z21 y21

z22 y22

w11

w12

w13

w14

w21

w22

w23

w24

InputLayer

HiddenLayer

OutputLayer

Init. of weights

W1 =w11 w12

w13 w14

é

ë

êê

ù

û

úú

=0.15 0.20

0.25 0.30

é

ëê

ù

ûú

W2 =w21 w22

w23 w24

é

ë

êê

ù

û

úú

=0.40 0.45

0.50 0.55

é

ëê

ù

ûú

b b

Training Data:

X =0.05

0.10

é

ëê

ù

ûú,T =

0.01

0.99

é

ëê

ù

ûú

Bias:

b1=0.35

0.35

é

ëê

ù

ûú,b1=

0.60

0.60

é

ëê

ù

ûú

W1 W2



STEP 2) Forward Propagation

40

x1

x2

t1

t2

z11 y11

z12 y12

z21 y21

z22 y22

w11

w12

w13

w14

w21

w22

w23

w24

InputLayer

HiddenLayer

OutputLayerW1 W2

Y1 = f Z1 =W1X( ) = f0.15 0.20

0.25 0.30

é

ëê

ù

ûú

0.05

0.10

é

ëê

ù

ûú+

0.35

0.35

é

ëê

ù

ûú

æ

èç

ö

ø÷ =

0.5933

0.5969

é

ëê

ù

ûú

Y2 = f Z2 =W2Y1( ) = f0.40 0.45

0.50 0.55

é

ëê

ù

ûú

0.5933

0.5969

é

ëê

ù

ûú+

0.6

0.6

é

ëê

ù

ûú

æ

èç

ö

ø÷ =

0.7514

0.7729

é

ëê

ù

ûú

b b



STEP 3) Error Back Propagation for W2

– 3-1: calculate error derivative wrt y21, y22

41

x1

x2

t1

t2

z11 y11

z12 y12

z21 y21

z22 y22

w11

w12

w13

w14

w21

w22

w23

w24

InputLayer

HiddenLayer

OutputLayerW1 W2

b b

¶J(W2 )

¶y21

=y21 - t1

y21(1- y21),¶J(W2 )

¶y22

=y22 - t2

y22 (1- y22 )




– 3-2: calculate error derivative wrt W2

42

x1

x2

t1

t2

z11 y11

z12 y12

z21 y21

z22 y22

w11

w12

w13

w14

w21

w22

w23

w24

InputLayer

HiddenLayer

OutputLayerW1

b b

Dw21 = -a¶J(W2 )

¶y21

¶y21

dz21

dz21

¶w21

= a (t1 - y21)y11,

Dw22 = -a¶J(W2 )

¶y21

¶y21

dz21

dz21

¶w22

= a (t1 - y21)y12 ,

Dw23 = -a¶J(W2 )

¶y22

¶y22

dz22

dz22

¶w23

= a (t2 - y22 )y11,

Dw24 = -a¶J(W2 )

¶y22

¶y22

dz22

dz22

¶w24

= a (t2 - y22 )y12



STEP 3) Error Back Propagation for W1 (Important!!)


43

x1

x2

t1

t2

z11 y11

z12 y12

z21 y21

z22 y22

w11

w12

w13

w14

w21

w22

w23

w24

InputLayer

HiddenLayer

OutputLayerW1 W2

b b

¶J(W1)

¶y11

= w21

¶J(W2 )

¶y21

¶y21

dz21

+w23

¶J(W2 )

¶y22

¶y22

dz22

¶J(W1)

¶y12

= w22

¶J(W2 )

¶y21

¶y21

dz21

+w24

¶J(W2 )

¶y22

¶y22

dz22



STEP 3) Error Back Propagation for W1 (Important!!)


44

x1

x2

t1

t2

z11 y11

z12 y12

z21 y21

z22 y22

w11

w12

w13

w14

w21

w22

w23

w24

InputLayer

HiddenLayer

OutputLayerW1 W2

b b

¶J(W1)

¶y11

= w21

¶J(W2 )

¶y21

y21(1- y21)+w23

¶J(W2 )

¶y22

y22(1- y22 )

¶J(W1)

¶y12

= w22

¶J(W2 )

¶y21

y21(1- y21)+w24

¶J(W2 )

¶y22

y22(1- y22 )




– 3-4: calculate error derivative wrt W1

45

x1

x2

t1

t2

z11 y11

z12 y12

z21 y21

z22 y22

w11

w12

w13

w14

w21

w22

w23

w24

InputLayer

HiddenLayer

OutputLayerW2

b b

Dw11 = -a¶J(W1)

¶y11

¶y11

dz11

dz11

¶w11

= -a¶J(W1)

¶y11

y11(1- y11)x1 ,

Dw12 = -a¶J(W1)

¶y11

¶y11

dz11

dz11

¶w12

= -a¶J(W1)

¶y11

y11(1- y11)x2 ,

Dw13 = -a¶J(W1)

¶y12

¶y12

dz12

dz12

¶w13

= -a¶J(W1)

¶y12

y12(1- y12 )x1 ,

Dw14 = -a¶J(W1)

¶y12

¶y12

dz12

dz12

¶w14

= -a¶J(W1)

¶y12

y12(1- y12 )x2



STEP 4) update all the weights and goto STEP 2

Iterate forward propagation and error back propagation

46

x1

x2

t1

t2

z11 y11

z12 y12

z21 y21

z22 y22

w11

w12

w13

w14

w21

w22

w23

w24

InputLayer

HiddenLayer

OutputLayerW1 W2

b b


LAB6: Multi-layer neural net in TensorFlow

Cluster in Cluster data https://github.com/jwkanggist/EveryBodyTensorFlow/blo

b/master/lab6_runTFMultiANN_clusterinclusterdata.py

47

https://github.com/jwkanggist/EveryBodyTensorFlow/blob/master/lab6_runTFMultiANN_clusterinclusterdata.py


LAB6: Multi-layer neural net in TensorFlow

Two spiral data– https://github.com/jwkanggist/EveryBodyTensorFlow/blob

/master/lab6_runTFMultiANN_spiraldata.py

48

https://github.com/jwkanggist/EveryBodyTensorFlow/blob/master/lab6_runTFMultiANN_spiraldata.py

[Tf2017] day4 jwkang_pub

Engineering