YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

Learning Bayesian Networks

CSE 473

Page 2: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 2

Last Time• Basic notions• Bayesian networks• Statistical learning

  Parameter learning (MAP, ML, Bayesian L)  Naïve Bayes

•Issues  Structure Learning  Expectation Maximization (EM)

• Dynamic Bayesian networks (DBNs)• Markov decision processes (MDPs)

Page 3: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 3

Simple Spam Model

Spam

Nigeria NudeSex …Naïve Bayes assumes attributes independent Given class of parent Correct?

Page 4: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 4

Naïve Bayes Incorrect, but Easy!

• P(spam | X1 … Xn) = P(spam | Xi)

• How compute P(spam | Xi)?

• How compute P(Xi | spam)?

i

Page 5: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

Smooth

with a Prior

• Note that if m = 10 in the above, it is like saying “I have seen 10 samples that make me believe that P(Xi | S) = p”

• Hence, m is referred to as the

equivalent sample size

# + # # + mp

+ mP(Xi | S) =

Page 6: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

Probabilities: I

mportant

Detail!

Is there any potential problem here?

• P(spam | X1 … Xn) = P(spam | Xi)i

• We are multiplying lots of small numbers Danger of underflow!  0.557 = 7 E -18

• Solution? Use logs and add!  p1 * p2 = e log(p1)+log(p2)

  Always keep in log form

Page 7: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 7

P(S | X)• Easy to compute from data if X discrete

Instance X Spam?

1 T F

2 T F

3 F T

4 T T

5 T F

• P(S | X) = ¼ ignoring smoothing…

Page 8: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 8

P(S | X)• What if X is real valued?

Instance X Spam?

1 0.01 F

2 0.01 F

3 0.02 F

4 0.03 T

5 0.05 T

• What now?

----- <T

----- <T

----- <T

----- >T

----- >T

Page 9: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 9

Anything Else?# X S?1 0.0

1

F

2 0.01

F

3 0.02

F

4 0.03

T

5 0.05

T

.01 .02 .03 .04 .05

3

2

1

0

P(S|.04)?

Page 10: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 10

Smooth with Gaussianthen sum

“Kernel Density Estimation”

# X S?1 0.0

1

F

2 0.01

F

3 0.02

F

4 0.03

T

5 0.05

T

.01 .02 .03 .04 .05

3

2

1

0

Page 11: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 11

Spam?# X S?1 0.0

1

F

2 0.01

F

3 0.02

F

4 0.03

T

5 0.05

T

.01 .02 .03 .04 .05

3

2

1

0

P(S| X=.023)

P(S| X=.023)

What’s with the shape?

Page 12: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 12

Analysis

Attribute value

Page 13: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 13

Recap

• Given a BN structure (with discrete or continuous variables), we can learn the parameters of the conditional prop tables.

Spam

Nigeria NudeSex

Earthqk Burgl

Alarm

N2N1

Page 14: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

What if we don’t know structure?

Page 15: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 15

Outline

• Statistical learning  Parameter learning (MAP, ML, Bayesian L)  Naïve Bayes

•Issues  Structure Learning  Expectation Maximization (EM)

• Dynamic Bayesian networks (DBNs)• Markov decision processes (MDPs)

Page 16: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

Learning The Structure

of

Bayesian

Networks

• Search thru the space   of possible network structures!  (for now, assume we observe all variables)

• For each structure, learn parameters• Pick the one that fits observed data best

  Caveat – won’t we end up fully connected?

  ????

• When scoring, add a penalty model complexity

Page 17: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

Learning The Structure

of

Bayesian

Networks

• Search thru the space • For each structure, learn parameters• Pick the one that fits observed data best

• Problem?  Exponential number of networks!  And we need to learn parameters for each!  Exhaustive search out of the question!

• So what now?

Page 18: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

Learning The Structure

of

Bayesian

Networks

Local search!  Start with some network

structure  Try to make a change   (add or delete or reverse

edge)  See if the new network is any

better

  What should be the initial state?

Page 19: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

Initial

Network Structure?

• Uniform prior over random networks?

• Network which reflects expert knowledge?

Page 20: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 20

Learning BN Structure

Page 21: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

The

Big Picture

• We described how to do MAP (and ML) learning of a Bayes net (including structure)

• How would Bayesian learning (of BNs) differ?

• Find all possible networks

• Calculate their posteriors

• When doing inference, return weighed combination of predictions from all networks!

Page 22: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 22

Outline

• Statistical learning  Parameter learning (MAP, ML, Bayesian L)  Naïve Bayes

•Issues  Structure Learning  Expectation Maximization (EM)

• Dynamic Bayesian networks (DBNs)• Markov decision processes (MDPs)

Page 23: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 23

Hidden Variables

• But we can’t observe the disease variable• Can’t we learn without it?

Page 24: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 24

We –could-• But we’d get a fully-connected network

• With 708 parameters (vs. 78)  Much harder to learn!

Page 25: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 25

Chicken & Egg Problem

• If we knew that a training instance (patient) had the disease…  It would be easy to learn P(symptom |

disease)  But we can’t observe disease, so we don’t.

• If we knew params, e.g. P(symptom | disease) then it’d be easy to estimate if the patient had the disease.   But we don’t know   these parameters.

Page 26: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 26

Expectation Maximization (EM)(high-level version)

• Pretend we do know the parameters  Initialize randomly

• [E step] Compute probability of instance having each possible value of the hidden variable

• [M step] Treating each instance as fractionally having both values compute the new parameter values

• Iterate until convergence!

Page 27: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 27

Candy Problem

• Given a bowl of mixed candies• Can we learn contents of each bag, and• What percentage of each is in bowl?

+

Page 28: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 28

Naïve Candy• Flavor: cherry / lime• Wrapper: red / green• Hole: 0/1

Page 29: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 29

Simpler Problem• Mixture of two distributions

• Know: form of distr, variance, % =5• Just need mean of each.01 .03 .05 .07 .09

Page 30: Learning Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Bayesian networks Statistical learning Parameter learning (MAP, ML, Bayesian.

© Daniel S. Weld 30

ML Mean of Single Gaussian

Uml = argminu i(xi – u)2

.01 .03 .05 .07 .09


Related Documents