Top Banner
Copyright 2011 Trend Micro Inc. 1 Introduction to Deep Neural Network Liwei Ren, Ph.D San Jose, California, Nov, 2016
62

Introduction to Deep Neural Network

Jan 22, 2018

Download

Technology

Liwei Ren
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc. 1

Introduction to Deep Neural Network Liwei Ren, Ph.D

San Jose, California, Nov, 2016

Page 2: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

Agenda

• What a DNN is

• How a DNN works

• Why a DNN works

• Those DNNs in action

• Where the challenges are

• Successful stories

• Security problems

• Summary

• Quiz

• What else

2

Page 3: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What is a DNN?

• DNN and AI in the secular world

3

Page 4: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What is a DNN?

• DNN and AI in the secular world

4

Page 5: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What is a DNN?

• DNN and AI in the secular world

5

Page 6: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What is a DNN?

• DNN in the technical world

6

Page 7: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What is a DNN?

• DNN in the technical world

7

Page 8: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What is a DNN?

• DNN in the technical world

8

Page 9: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What is a DNN?

• Categorizing the DNNs :

9

Page 10: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What is a DNN?

• Three technical elements • Architecture: the graph, weights/biases, activation functions

• Activity Rule: weights/biases, activation functions

• Learning Rule: a typical one is backpropagation algorithm

• Three masters in this area:

10

Page 11: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What is a DNN?

• Given a practical problem , we have two approaches to solve it.

11

Page 12: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What is a DNN?

• An example: image recognition

12

Page 13: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What is a DNN?

• An example: image recognition

13

Page 14: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What is a DNN?

• In the mathematical world – A DNN is a mathematical function f: D S, where D ⊆ Rn and S ⊆ Rm,

which is constructed by a directed graph based architecture.

– A DNN is also a composition of functions from a network of primitive functions.

14

Page 15: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What is a DNN?

• We denote the a feed-forward DNN function by O= f(I) which is determined by a few parameters G, Φ ,W,B

• Hyper-parameters:

– G is the directed graph which presents the structure

– Φ presents one or multiple activation functions for activating the nodes

• Parameters:

– W is the vector of weights relevant to the edges

– B is the vector of biases relevant to the nodes

15

Page 16: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What is a DNN?

• Activation at a node:

16

Page 17: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What is a DNN?

• Activation function:

17

Page 18: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What is a DNN?

• G=(V,E) is a graph and Φ is a set of activation functions.

• <G,Φ> constructs a family of functions F: – F(G,Φ) = { f | f is a function constructed by <G, Φ ,W> where WϵRN }

• N= total number of weights at all nodes of output layer and hidden layers.

• Each f(I) can be denoted by f(I ,W).

18

Page 19: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What is a DNN?

• Mathematically, a DNN based supervised machine learning technology can be described as follows : – Given g ϵ { h | h:D S where D ⊆ Rn and S ⊆ Rm} and δ>0 , find f ϵ

F(G,Φ) such that 𝑓 − 𝑔 < δ.

• Essentially, it is to identify a W ϵ RN such that 𝑓(∗,𝑊) − 𝑔 < δ

• However, in practice, g is not explicitly expressed . It usually appears in a sequence of samples:

– { <I(j),T(j)> | T(j) =g(I(j)), j=1, 2, …,M}

• where I(j) is an input vector and T(j) is its corresponding target vector.

19

Page 20: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

How Does a DNN work ?

• The function g is not explicitly expressed, we are not able to calculate g − f(∗,W)

• Instead, we evaluate the error function E(W)= 1

2𝑀∑||T(j) -

f(I(j),W)||2

• We expect to determine W such that E(W) < δ

• How to identify W ϵ RN so that E(W) < δ ? Lets solve the nonlinear optimization problem min{E(W)| W ϵ RN} , i.e.:

min{1

2𝑀 ∑|| T(j) - f(I(j),W) ||2 | W ϵ RN } (P1)

20

Page 21: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

How Does a DNN work ?

• (P1) is for batch mode training, however ,it is too expensive.

• In order to reduce the computational cost, a sequential mode is introduced.

• Picking <I,T> ϵ {<I(1),T(1) >, <I(2),T(2)> ,…, <I(M),T(M)>}

sequentially, let the output of the network as O= f(I,W) for any W:

• Error function E(W)= ||T- f(I,W)||2 /2 = ∑(Tj-Oj)2 /2

• Each Oj can be considered as a function of W. We denote it as Oj(W).

• We have the optimization problem for training with sequential mode:

– min{ ∑(Tj-Oj(W) )2 /2 | W ϵ RN} (P2)

21

Page 22: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

How Does a DNN work ?

• One may ask whether we get the same solution for both batch mode and sequential mode ?

• BTW

– batch mode = offline mode

– sequential mode = online mode

• We focus on online mode in this talk

22

Page 23: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

How Does a DNN work ?

• How to solve the unconstrained nonlinear optimization problem (P2)?

• The general approach of unconstrained nonlinear optimization is to find local minima of E(W) by using the iterative process of Gradient Descent.

•∂E = (∂E/∂W1, ∂E/∂W2, …, ∂E/∂WT)

• The iterations: – ΔWj = - γ ∂E/∂Wj for j=1, …,T

– Updating W in each step by

• Wj (k+1) = Wj

(k) - γ ∂E(W (k))/∂Wj for j=1, …,T (A1)

• until E(W (k+1)) < δ or E(W (k+1)) can not be reduced anymore

23

Page 24: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

How Does a DNN work ?

• The algorithm of Gradient Descent:

24

Page 25: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

How Does a DNN work ?

• From the perspective of mathematics, the process of Gradient Descent is straightforward.

• However, from the perspective of scientific computing, it is quite challenging to calculate the values of all ∂E/∂Wj for j=1, …,N:

– The complexity of presenting each ∂E/∂Wj where j=1, …,N.

– There are (k+1)-layer function compositions for a DNN of k hidden layers.

25

Page 26: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

How Does a DNN work ?

• For example, we have a very simple network as follows with the activation function φ(v)=1/(1 + 𝑒−𝑣).

• E(W) = [ T - f(I,W) ]2 /2= [T – φ(w1φ(w3I+ w2) + w0)]2 /2, we

have:

– ∂E/∂w0 = -[T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0)

– ∂E/∂w1 = -[T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0) φ(w3I+w2)

– ∂E/∂w2 = - w1 [T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0) φ’(w3I+w2)

– ∂E/∂w3 = - I w1 [T – φ(w1φ(w3I + w2) + w0)] φ’(w1φ(w3I+w2) + w0) φ’(w3I+w2)

26

Page 27: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

How Does a DNN work ?

• Lets imagine a network of N inputs, M outputs and K hidden layers each of which has L nodes.

– It is a daunting task to express ∂E/∂wj explicitly. Last simple example already shows this.

• The backpropagation (BP) algorithm was proposed as a rescue:

– Main idea : the weights of (k-1)-th hidden layer can be expressed by the k-th layer recursively.

– We can start with the output layer which is considered as (L+1)-layer.

27

Page 28: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

How Does a DNN work ?

• BP algorithm has the following major steps:

1. Feed-forward computation

2. Back-propagation to the output layer

3. Back-propagation to the hidden layers

4. Weight updates

28

Page 29: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

How Does a DNN work ?

29

Page 30: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

How Does a DNN work ?

• A general DNN can be drawn as follows

30

Page 31: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

How Does a DNN work ?

• How to express the weights of (k-1)-th hidden layer by the weights of k-th layer recursively?

31

Page 32: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

How Does a DNN work ?

• Let us experience the BP with our small network.

– E(W) = [ T - f(I,W) ]2 /2= [T – φ(w1φ(w3I+ w2) + w0)]2 /2.

• ∂E/∂w0 = - φ’(O) (T – O)

• ∂E/∂w1 = -φ’(O) (T – O) φ(O)

• ∂E/∂w2 = - φ’(O) (T – O) φ’(H) w1 * 1

• ∂E/∂w3 = - φ’(O) (T – O) φ’(H) w1 * I

– Let H0(1)= 1, H1

(1) = H = φ(w3I+ w2), H1(0) = I, we verify the follows:

• δ1(2)= φ’(O) (T – O)

• w0+

= w0 + γ δ1(2) H0

(1) , w1+

= w1 + γ δ1(2) H1

(1)

• δ1(1)= φ’(H1

(1)) δ1(2) w1

• w2+

= w2 + γ δ1(1) H0

(0) , w3+

= w3 + γ δ1(1) H1

(0)

• where w0 = w0,1(2) , w1 = w1,1

(2), w2 = w0,1(1) , w2 = w1,1

(1)

32

Page 33: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

Why Does a DNN Work?

• It is amazing ! However, why does it work?

• For a FNN, it is to ask whether the following approximation problem has a solution: – Given g ϵ { h | h:D S where D ⊆ Rn and S ⊆ Rm} and δ>0 , find a W ϵ

RN such that 𝑓(∗,𝑊) − 𝑔 < δ.

• Universal approximation theorem (S): – Let φ(.) be a bounded and monotonically-increasing continuous function. Let

Im denote the m-dimensional unit hypercube [0,1]m . The space of continuous functions on Im is denoted by C(Im) . Then, given any function f ϵ C(Im) and ε>0 , there exists an integer N , real constants vi, bi ϵ R and real vectors wi ϵ Rm, where i=1, …, N , such that

|F(x)-f(x)| < ε

for all x ϵ Im , where F(x) = vi φ(wi T x + bi) 𝑵

𝒊=𝟏 is an approximation to the function f which is independent of φ .

33

Im

Page 34: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

Why Does a DNN Work?

• Its corresponding network with only one hidden layer – NOTE : this is not even a general case for one hidden layer. It is a

special case. WHY?

– However, it is powerful and encouraging from the mathematical perspective.

34

Im

Page 35: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

Why Does a DNN Work?

The general networks have a general version of Universal Approximation Theorem accordingly:

35

Im

Page 36: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

Why Does a DNN Work?

• Universal approximation theorem (G): – Let φ(.) be a bounded and monotonically-increasing continuous function. Let

S be a compact space in Rm . Let C(S ) = {g | g:S ⊂ Rm Rn is continuous}. Then, given any function f ϵ C(S) and ε>0 , there exists a FNN as shown above which constructs the network function F such that

|| F(x)-f(x) || < ε

where F is an approximation to the function f which is independent of φ .

• It seems both shallow and deep neural networks can construct an approximation to a given function. – Which is better?

– Or which is more efficient in terms of using less nodes ?

36

Im

Rm

Page 37: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

Why Does a DNN Work?

• Mathematical foundation of neural networks:

37

Im

Rm

Page 38: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

Those DNNs in action

• DNN has three elements

• Architecture: the graph , weights/biases, activation functions

• Activity Rule: weights/biases, activation functions

• Learning Rule: a typical one is backpropagation algorithm

• The architecture basically determines the capability of a specific DNN

– Different architectures are suitable for different applications.

– The most general architecture of an ANN is a DAG ( directed acyclic graph).

38

Page 39: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

Those DNNs in action

• There are a few well-known categories of DNNs.

39

Page 40: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What Are the Challenges?

• Given a specific problem, there are a few questions before one starts the journey with DNNs:

– Do you understand the problem that you need to solve?

– Do you really want to solve this problem with DNN, why?

• Do you have an alternative yet effective solution?

– Do you know how to describe the problem in DNN mathematically ?

– Do you know how to implement a DNN , beyond a few APIs and sizzling hype?

– How to collect sufficient data for training?

– How to solve the problem efficiently and cost-effectively?

40

Page 41: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What Are the Challenges?

• 1st Challenge:

– a full mesh network has the curse of dimensionality.

41

Page 42: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What Are the Challenges?

• Many tasks of FNN do not need a full mesh network.

• For example, if we can present the input vector as a grid, the nearest-neighborhood models can be used when constructing an effective FNN which can reduce connections

– Image recognition

– GO (圍棋) : a game that two players play on a 19x19 grid of lines.

42

Page 43: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What Are the Challenges?

• The 2nd challenge is how to describe a technical problem in terms of DNN, i.e., mathematical modeling. There are generally two approaches:

– Applying a well-learned DNN architecture to describe the problem. Deep understanding of the specific network is usually required!

• Two general DNN architectures are well-known

– FNN: feedforward neural network. Its special architecture CNN (convolutional neural network) is widely used in many applications such as image recognition, GO, and etc.

– RNN: recurrent neural network. Its special architecture is LSTM (long short-term memory) which has been applied successfully in speech recognition, language translation, and etc.

• For example, if we want to try a FNN, how to describe the problem in terms of <Input vector, Output vector> with fixed dimension ?

– Creating a novel DNN architecture from ground if none of the existing

models fits your problem. Deep understanding of DNN theory / algorithms is required.

43

Page 44: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What Are the Challenges?

• Handwriting digit recognition: – Modeling this problem is straightforward

44

Page 45: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What Are the Challenges?

• Image Recognition is also straightforward

45

Page 46: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What Are the Challenges?

• However, due to the curse of dimensionality, we can use a special FFN: – Convolutional neural network (CNN)

46

Page 47: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What Are the Challenges?

• How to construct a DNN to describe language translation ?

– They use LSTM networks

• How to construct a DNN to describe the problem of malware classification?

• How to construct a DNN to describe the network traffic for security purpose?

47

Page 48: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What Are the Challenges?

• The 3rd challenge is how to collect sufficient training data. To achieve required accuracy, sufficient training data is necessary. WHY?

48

Page 49: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What Are the Challenges?

• The 4th challenge is how to identify various talents for providing a DNN solution to solve specific problems.

– Who knows how to use existing DL APIs such as TensorFlow

– Who understands various DNN architectures in depth so that he/she knows how to evaluate and identify a suitable DNN architecture to solve the problem.

– Who understands the theory and algorithms of the DNN in depth so that he/she can create and design a novel DNN from ground.

49

Page 50: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

Successful Stories

• ImageNet : 1M+ images, 1000+ categories, CNN

50

Page 51: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

Successful Stories

• Unsupervised learning neural networks… YouTube and the Cat .

51

Page 52: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

Successful Stories

• AlphaGo, a significant milestone in AI history

– More significant than DeepBlue

• Both Policy Network and Value Network are CNNs.

52

Page 53: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

Successful Stories

• Google Machine Neural Translation… LSTM (Long Short Term Memory) network

53

Page 54: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

Successful Stories

• Microsoft Speech Recognition … LSTM and TDNN (Time Delay Neural Networks )

54

Page 55: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

Security Problems

• Not disclosed for the public version.

55

Page 56: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

Summary

• What a DNN is

• How a DNN works

• Why a DNN works

• The categories of DNNs

• Some challenges

• Well-known stories

• Security problems

56

Page 57: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

Quiz

• Why do we choose the activation function as a nonlinear function?

• Why Deep? Why deep networks are better than shallow networks?

• What is the difference between online and batch mode training?

• Will online and batch mode training converge to the same solution?

• Why do we need the backpropagation algorithm?

• Why do we apply convolutional neural networks to image recognition?

57

Page 58: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

Quiz

• If we solve a problem with a FNN, – how many deep layers should we go?

– How many nodes are good for each layer?

– How to estimate and optimize the cost?

• Is it guaranteed that the backpropagation algorithm converge to a solution?

• Why do we need sufficient data for training in order to achieve certain accuracy?

• Can a DNN do some tasks more than extending human’s capabilities or automating extensive manual tasks ? – To prove a mathematical theorem ... or to introduce an interesting

concept… or to appreciate a poem… or to love…

58

Page 59: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

Quiz

• AlphaGo is trained for 19x19 lattice. If we play GO game on 20x20 board, can AlphaGo handle it?

• ImageNet is trained for 1000 categories. If we add the 1001-th category, what should we do?

• People do consider a special DNN as a black box. Why?

• More questions from you …

59

Page 60: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What Else?

• What to share next from me? Why do you care? – Various DNNs: principles, examples, analysis and

experiments…

• ImageNet, AlphaGO, GNMT and etc..

– My Ph.D work and its relevance to DNN

– Little History of AI and Artificial Neural Network

– Various Schools of the AI Discipline

– Strong AI vs. Weak AI

60

Page 61: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What Else?

• What to share next from me? Why do you care? – Questions when thinking about AI:

• Are we able to understand how we learn?

• Are we going the right directions mathematically and scientifically?

• Are there simple principles for cognition like what Newton and Einstein established for understanding our universe?

• What are we lack between now and the coming of so called Strong AI?

61

Page 62: Introduction to Deep Neural Network

Copyright 2011 Trend Micro Inc.

What Else?

• What to share next from me? Why do you care? •Questions about who we are.

– Are we created?

– Are we the AI of the creator?

• My little theory about the Universe

62