Top Banner
Deep Learning Bing-Chen Tsai 1/21 1
32

Deep Learning Bing-Chen Tsai 1/21 1. outline Neural networks Graphical model Belief nets Boltzmann machine DBN Reference 2.

Dec 14, 2015

Download

Documents

Kolton Bardwell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

1

Deep LearningBing-Chen Tsai

1/21

Page 2: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

2

outline

Neural networks Graphical model Belief nets Boltzmann machine DBN Reference

Page 3: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

3

Neural networks Supervised learning

The training data consists of input information with their corresponding output information.

Unsupervised learning The training data consists of input information without their

corresponding output information.

Page 4: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

4

Neural networks Generative model

Model the distribution of input as well as output ,P(x , y) Discriminative model

Model the posterior probabilities ,P(y | x)

P(x,y1)

P(x,y2)

P(y1|x)

P(y2|x)

Page 5: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

5

Neural networks What is the neural?

Linear neurons

Binary threshold neurons

Sigmoid neurons

Stochastic binary neurons

x1

x2

1

w1w2

b

y

ii

iwxby

y 1 if0 otherwise

ii

iwxbz ze

y

1

1

ii

iwxbz ze

yp

1

1)1(

Page 6: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

6

Neural networks Two layer neural networks (Sigmoid neurons)

Back-propagationStep1:Randomly initial weightDetermine the output vectorStep2:Evaluating the gradient of an error functionStep3:Adjusting weight, Repeat The step1,2,3 until error enough low

Page 7: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

7

Neural networks Back-propagation is not good for deep

learning It requires labeled training data.

Almost data is unlabeled. The learning time is very slow in networks with multiple

hidden layers. It is very slow in networks with multi hidden layer.

It can get stuck in poor local optima. For deep nets they are far from optimal.

Learn P(input) not P(output | input) What kind of generative model should we learn?

Page 8: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

8

outline

Neural networks Graphical model Belief nets Boltzmann machine DBN Reference

Page 9: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

9

Graphical model A graphical model is a probabilistic model

for which graph denotes the conditional dependence structure between random variables probabilistic model

 In this example: D depends on A, D depends on B, D depends on C, C depends on B, and C depends on D.

Page 10: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

10

Graphical model Directed graphical model

Undirected graphical model

A

B C

D

A

B

C

D

𝑃 ( 𝐴 ,𝐵 ,𝐶 ,𝐷 )= 1𝑍∗φ ( 𝐴 ,𝐵 ,𝐶 )∗𝜑 (𝐵 ,𝐶 ,𝐷)

𝑃 ( 𝐴 ,𝐵 ,𝐶 ,𝐷 )=𝑃 ( 𝐴) 𝑃 (𝐵|𝐴 )𝑃 (𝐶|𝐴 ) 𝑃 (𝐷∨𝐵 ,𝐶)

Page 11: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

11

outline

Neural networks Graphical model Belief nets Boltzmann machine DBN Reference

Page 12: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

12

Belief nets A belief net is a directed acyclic graph

composed of stochastic variablesstochastic hidden causes

visible

Stochastic binary neurons

ii

iwxbz ze

yp

1

1)1(

It is sigmoid belief nets

Page 13: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

13

Belief nets we would like to solve two problems

The inference problem: Infer the states of the unobserved variables.

The learning problem: Adjust the interactions between variables to make the network more likely to generate the training data.stochastic hidden causes

visible

Page 14: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

14

Belief nets It is easy to generate sample P(v | h) It is hard to infer P(h | v)

Explaining away stochastic hidden causes

visible

Page 15: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

15

Belief nets Explaining away

H1

H2

V

H1 and H2 are independent, but they can become dependentwhen we observe an effect that they can both influence

Page 16: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

16

Belief nets Some methods for learning deep belief nets

Monte Carlo methods But its painfully slow for large, deep belief nets

Learning with samples from the wrong distribution Use Restricted Boltzmann Machines

Page 17: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

17

outline

Neural networks Graphical model Belief nets Boltzmann machine DBN Reference

Page 18: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

18

Boltzmann Machine It is a Undirected graphical model The Energy of a joint configuration

hidden

i

j

visible

Page 19: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

19

Boltzmann Machine

h1 h2

+2 +1

v1 v2

-1

An example of how weights define a distribution

Page 20: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

20

Boltzmann Machine A very surprising fact

Derivative of log probability of one training vector, v under the model.

Expected value of product of states at thermal equilibrium when v is clamped on the visible units

Expected value of product of states at thermal equilibrium with no clamping

Page 21: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

21

Boltzmann Machines Restricted Boltzmann Machine We restrict the connectivity to make learning easier.

Only one layer of hidden units. We will deal with more layers later

No connections between hidden units

Making the updates more parallel

visible

Page 22: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

22

Boltzmann Machines the Boltzmann machine learning algorithm

for an RBM

i

j

i i

j

i

j

t = 0

j

t = 1

t = 2

t = infinity

Page 23: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

23

Boltzmann Machines Contrastive divergence: A very surprising

short-cut

t = 0 t = 1 reconstructiondata

i

j

i

j

This is not following the gradient of the log likelihood. But it works well.

Page 24: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

24

outline

Neural networks Graphical model Belief nets Boltzmann machine DBN Reference

Page 25: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

25

DBN It is easy to generate sample P(v | h) It is hard to infer P(h | v)

Explaining away

Use RBM to initial weight can get good optimal

stochastic hidden causes

visible

Page 26: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

26

DBN Combining two RBMs to make a DBN

1W

2W

2h

1h

1h

v

1W

2W

2h

1h

v

copy binary state for each v

Compose the two RBM models to make a single DBN model

Train this RBM first

Then train this RBM

It’s a deep belief nets!

Page 27: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

27

DBN Why we can use RBM to initial belief nets

weights? An infinite sigmoid belief net that is equivalent to an RBM

Inference in a directed net with replicated weights

Inference is trivial. We just multiply v0 by W transpose. The model above h0 implements a complementary prior. Multiplying v0 by W transpose gives the product of the likelihood term and the prior term.

W

v1

h1

v0

h0

v2

h2

TW

TW

TW

W

W

etc.

Page 28: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

28

DBN

Complementary prior

A Markov chain is a sequence of variables X1;X2; : : : with the Markov property

A Markov chain is stationary if the transition probabilities do not

depend on time

is called the transition matrix. If a Markov chain is ergodic it has a unique equilibrium

distribution

X1X2

X3

X4

Page 29: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

29

DBN Most Markov chains used in practice satisfy detailed

balance

e.g. Gibbs, Metropolis-Hastings, slice sampling. . . Such Markov chains are reversible

X1X2

X3

X4

𝑃∞ (𝑋 1 )𝑇 (𝑋 1→𝑋 2 )𝑇 (𝑋 2→𝑋 3 )𝑇 (𝑋 3→𝑋 4)

X1X2

X3

X4

𝑇 ( 𝑋1←𝑋 2 )𝑇 ( 𝑋 2←𝑋 3 )𝑇 (𝑋 3← 𝑋 4 )𝑃 ∞(𝑋 4)

Page 30: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

30

DBN

Page 31: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

31

DBN Combining two RBMs to make a DBN

1W

2W

2h

1h

1h

v

1W

2W

2h

1h

v

copy binary state for each v

Compose the two RBM models to make a single DBN model

Train this RBM first

Then train this RBM

It’s a deep belief nets!

Page 32: Deep Learning Bing-Chen Tsai 1/21 1. outline  Neural networks  Graphical model  Belief nets  Boltzmann machine  DBN  Reference 2.

32

Reference

Deep Belief Nets,2007 NIPS tutorial , G . Hinton https://class.coursera.org/neuralnets-

2012-001/class/index Machine learning 上課講義 http://en.wikipedia.org/wiki/

Graphical_model