Top Banner
Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP HREKG HRSAT ERRCAUTER HR HISTORY CATECHOL SAO2 EXPCO2 ARTCO2 VENTALV VENTLUNG VENITUBE DISCONNECT MINVOLSET VENTMACH KINKEDTUBE INTUBATION PULMEMBOLUS PAP SHUNT ANAPHYLAXIS MINOVL PVSAT FIO2 PRESS INSUFFANESTH TPR LVFAILURE ERRBLOWOUTPUT STROEVOLUME LVEDVOLUME HYPOVOLEMIA CVP BP Advanced I WS 06/07 Based on J. A. Bilmes,“A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models“, TR-97-021, U.C. Berkeley, April 1998; G. J. McLachlan, T. Krishnan, „The EM Algorithm and Extensions“, John Wiley & Sons, Inc., 1997; D. Koller, course CS-228 handouts, Stanford University, 2001., N. Friedman & D. Koller‘s NIPS‘99. Structure Learning
22

Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Graphical Models- Learning -

Graphical Models- Learning -

Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel

Albert-Ludwigs University Freiburg, Germany

PCWP CO

HRBP

HREKG HRSAT

ERRCAUTERHRHISTORY

CATECHOL

SAO2 EXPCO2

ARTCO2

VENTALV

VENTLUNG VENITUBE

DISCONNECT

MINVOLSET

VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS

PAP SHUNT

ANAPHYLAXIS

MINOVL

PVSAT

FIO2PRESS

INSUFFANESTHTPR

LVFAILURE

ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME

HYPOVOLEMIA

CVP

BP

AdvancedI WS 06/07

Based on J. A. Bilmes,“A Gentle Tutorial of the EM Algorithm and its Applicationto Parameter Estimation for Gaussian Mixture and Hidden Markov Models“, TR-97-021, U.C. Berkeley, April 1998; G. J. McLachlan, T. Krishnan, „The EM Algorithm and Extensions“, John Wiley & Sons, Inc., 1997; D. Koller, course CS-228 handouts, Stanford University, 2001., N. Friedman & D. Koller‘s NIPS‘99.

Structure Learning

Page 2: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Learning With Bayesian Networks

Fixed structure Fixed variables Hidden variables

ob

serv

ed

fully

Partia

lly

Easiest problemcounting

Selection of arcsNew domain with no domain expert

Data mining

Numerical, nonlinear

optimization,Multiple calls to

BNs,Difficult for large

networks

Encompasses to difficult

subproblem,„Only“ Structural

EM is known

Scientific discouvery

A B A B? A B? ? H

- Learning

- Learning

Stucture learing?Parameter Estimation

Page 3: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07

• Increases the number of parameters to be estimated

• Wrong assumptions about domain structure

• Cannot be compensated for by fitting parameters

• Wrong assumptions about domain structure

Earthquake Alarm Set

Sound

Burglary Earthquake Alarm Set

Sound

Burglary

Earthquake Alarm Set

Sound

Burglary

Adding an arcMissing an arc

Why Struggle for Accurate Structure?

- Learning

- Learning

Page 4: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07

Unknown Structure, (In)complete Data

E B

A

.9 .1

e

b

e

.7 .3

.99 .01

.8 .2

be

b

b

e

BE P(A | E,B)

Learningalgorithm

- Learning

- Learning

? ?

e

b

e

? ?

? ?

? ?

be

b

b

e

BE P(A | E,B)

E, B, A<Y,?,N><Y,N,?><N,N,Y><N,Y,Y> . .<?,Y,Y>

• Network structure is not specified• Data contains missing values

– Need to consider assignments to missing values

E B

A

E, B, A<Y,N,N><Y,N,Y><N,N,Y><N,Y,Y> . .<N,Y,Y>

• Network structure is not specified– Learnerr needs to select arcs &

estimate parameters

• Data does not contain missing values

Page 5: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Score-based Learning

E, B, A<Y,N,N><Y,Y,Y><N,N,Y><N,Y,Y> . .<N,Y,Y>

E B

A

E

B

A

E

BA

Search for a structure that maximizes the score

Define scoring function that evaluates how well a structure matches the data

score

- Learning

- Learning

Page 6: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Structure Search as Optimization

Input:– Training data– Scoring function– Set of possible structures

Output:– A network that maximizes the score -

Learning

- Learning

Page 7: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Heuristic Search

• Define a search space:– search states are possible structures– operators make small changes to

structure• Traverse space looking for high-

scoring structures• Search techniques:

– Greedy hill-climbing– Best first search– Simulated Annealing– ...

Theorem: Finding maximal scoring structure with at most k parents per node is NP-hard for k > 1

- Learning

- Learning

Page 8: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Typically: Local Search

S C

E

D - Learning

- Learning

• Start with a given network– empty network, best tree , a random

network• At each iteration

– Evaluate all possible changes– Apply change based on score

• Stop when no modification improves score

Page 9: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Typically: Local Search

S C

E

D

Add C

D

S C

E

D

- Learning

- Learning

• Start with a given network– empty network, best tree , a random

network• At each iteration

– Evaluate all possible changes– Apply change based on score

• Stop when no modification improves score

Page 10: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Typically: Local Search

S C

E

D

Reverse C

E

Add C

D

S C

E

D

- Learning

- Learning

• Start with a given network– empty network, best tree , a random

network• At each iteration

– Evaluate all possible changes– Apply change based on score

• Stop when no modification improves score

S C

E

D

Page 11: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Typically: Local Search

S C

E

D

Reverse C

EDelete C

E

Add C

D

S C

E

D

S C

E

D

- Learning

- Learning

• Start with a given network– empty network, best tree , a random

network• At each iteration

– Evaluate all possible changes– Apply change based on score

• Stop when no modification improves score

S C

E

D

Page 12: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Typically: Local Search

S C

E

D

Reverse C

EDelete C

E

Add C

D

S C

E

D

S C

E

D

If data is complete:To update score after local change, only re-score (counting) families that changed

- Learning

- Learning

S C

E

D

If data is incomplete:To update score after local change, reran parameter estimation algorithm

Page 13: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07

• Local search can get stuck in:– Local Maxima:

•All one-edge changes reduce the score– Plateaux:

•Some one-edge changes leave the score unchanged

• Standard heuristics can escape both– Random restarts– TABU search– Simulated annealing

Local Search in Practice

- Learning

- Learning

Page 14: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Local Search in Practice• Using LL as score, adding arcs always helps

– Max score attained by fully connected network– Overfitting: A bad idea…

• Minimum Description Length:– Learning data compression

• Other: BIC (Bayesian Information Criterion), Bayesian score (BDe)

||2

log),|(log)|( Θ+Θ−=

NGDPDBNMDL

DL(Model)

DL(Data|model)

<9.7 0.6 8 14

18> <0.2 1.3 5 ?? ??

> <1.3 2.8 ?? 0 1

> <?? 5.6 0 10 ??

> ……………….

- Learning

- Learning

Page 15: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Local Search in Practice

G1G3 G2

Parametric optimization

(EM)

Parameter space

Local Maximum

G4Gn

• Perform EM for each candidate graph

- Learning

- Learning

Page 16: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Local Search in Practice

• Perform EM for each candidate graphG1G3 G2

Parametric optimization

(EM)

Parameter space

Local Maximum

G4Gn

Computationally expensive: Parameter optimization via EM — non-trivial Need to perform EM for all candidate structures Spend time even on poor candidates

In practice, considers only a few candidates

- Learning

- Learning

Page 17: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07

Structural EM [Friedman et al. 98]

Recall, in complete data we had–Decomposition efficient search

Idea: • Instead of optimizing the real score… •Find decomposable alternative

score•Such that maximizing new score

improvement in real score

- Learning

- Learning

Page 18: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07

Structural EM [Friedman et al. 98]

Idea: • Use current model to help evaluate new

structures

Outline:• Perform search in (Structure, Parameters)

space• At each iteration, use current model for

finding either:– Better scoring parameters: “parametric” EM

stepor– Better scoring structure: “structural” EM step

- Learning

- Learning

Page 19: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07

TrainingData

Expected CountsN(X1)N(X2)N(X3)N(H, X1, X1, X3)N(Y1, H)N(Y2, H)N(Y3, H)

Computation

X1 X2 X3

H

Y1 Y2 Y3

X1 X2 X3

H

Y1 Y2 Y3

Score &

Parameterize

Reiterate

N(X2,X1)N(H, X1, X3)N(Y1, X2)N(Y2, Y1, H)

X1 X2 X3

H

Y1 Y2 Y3

Structural EM [Friedman et al. 98]

- Learning

- Learning

Page 20: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07

Structure Learning: incomplete data

EM-algorithm:iterate until convergence

Current model

S X D C B <? 0 1 0

1> <1 1 ? 0

1> <0 0 0 ? ?

> <? ? 0 ?

1> ………

Data

S X D C B 1 0 1 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 0 1 ………..

Expected counts

Expectation Inference: P(S|X=0,D=1,C=0,B=1)

Maximization Parameters

- Learning

- Learning

E

BA

Page 21: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07

Structure Learning: incomplete data

SEM-algorithm:iterate until convergence

Current model

S X D C B <? 0 1 0

1> <1 1 ? 0

1> <0 0 0 ? ?

> <? ? 0 ?

1> ………

Data

S X D C B 1 0 1 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 0 1 ………..

Expected counts

Expectation Inference: P(S|X=0,D=1,C=0,B=1)

Maximization Parameters

- Learning

- Learning

MaximizationStructure

E B

A

E

B

A

E

BA

E

BA

E B

A

Page 22: Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Structure Learning: Summary

• Expert knowledge + learning from data• Structure learning involves parameter

estimation (e.g. EM) • Optimization w/ score functions

– likelihood + complexity penality = MDL• Local traversing of space of possible

structures:– add, reverse, delete (single) arcs

• Speed-up: Structural EM– Score candidates w.r.t. current best

model

- Learning

- Learning