Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Graphical Models- Learning -

Graphical Models- Learning -

Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel

Albert-Ludwigs University Freiburg, Germany

PCWP CO

HRBP

HREKG HRSAT

ERRCAUTERHRHISTORY

CATECHOL

SAO2 EXPCO2

ARTCO2

VENTALV

VENTLUNG VENITUBE

DISCONNECT

MINVOLSET

VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS

PAP SHUNT

ANAPHYLAXIS

MINOVL

PVSAT

FIO2PRESS

INSUFFANESTHTPR

LVFAILURE

ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME

HYPOVOLEMIA

CVP

BP

AdvancedI WS 06/07

Based on J. A. Bilmes,“A Gentle Tutorial of the EM Algorithm and its Applicationto Parameter Estimation for Gaussian Mixture and Hidden Markov Models“, TR-97-021, U.C. Berkeley, April 1998; G. J. McLachlan, T. Krishnan, „The EM Algorithm and Extensions“, John Wiley & Sons, Inc., 1997; D. Koller, course CS-228 handouts, Stanford University, 2001., N. Friedman & D. Koller‘s NIPS‘99.

Structure Learning

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Learning With Bayesian Networks

Fixed structure Fixed variables Hidden variables

ob

serv

ed

fully

Partia

lly

Easiest problemcounting

Selection of arcsNew domain with no domain expert

Data mining

Numerical, nonlinear

optimization,Multiple calls to

BNs,Difficult for large

networks

Encompasses to difficult

subproblem,„Only“ Structural

EM is known

Scientific discouvery

A B A B? A B? ? H

- Learning

- Learning

Stucture learing?Parameter Estimation

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07

• Increases the number of parameters to be estimated

• Wrong assumptions about domain structure

• Cannot be compensated for by fitting parameters

• Wrong assumptions about domain structure

Earthquake Alarm Set

Sound

Burglary Earthquake Alarm Set

Sound

Burglary

Earthquake Alarm Set

Sound

Burglary

Adding an arcMissing an arc

Why Struggle for Accurate Structure?

- Learning

- Learning

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07

Unknown Structure, (In)complete Data

E B

A

.9 .1

e

b

e

.7 .3

.99 .01

.8 .2

be

b

b

e

BE P(A | E,B)

Learningalgorithm

- Learning

- Learning

? ?

e

b

e

? ?

? ?

? ?

be

b

b

e

BE P(A | E,B)

E, B, A<Y,?,N><Y,N,?><N,N,Y><N,Y,Y> . .<?,Y,Y>

• Network structure is not specified• Data contains missing values

– Need to consider assignments to missing values

E B

A

E, B, A<Y,N,N><Y,N,Y><N,N,Y><N,Y,Y> . .<N,Y,Y>

• Network structure is not specified– Learnerr needs to select arcs &

estimate parameters

• Data does not contain missing values

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Score-based Learning

E, B, A<Y,N,N><Y,Y,Y><N,N,Y><N,Y,Y> . .<N,Y,Y>

E B

A

E

B

A

E

BA

Search for a structure that maximizes the score

Define scoring function that evaluates how well a structure matches the data

score

- Learning

- Learning

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Structure Search as Optimization

Input:– Training data– Scoring function– Set of possible structures

Output:– A network that maximizes the score -

Learning

- Learning

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Heuristic Search

• Define a search space:– search states are possible structures– operators make small changes to

structure• Traverse space looking for high-

scoring structures• Search techniques:

– Greedy hill-climbing– Best first search– Simulated Annealing– ...

Theorem: Finding maximal scoring structure with at most k parents per node is NP-hard for k > 1

- Learning

- Learning

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Typically: Local Search

S C

E

D - Learning

- Learning

• Start with a given network– empty network, best tree , a random

network• At each iteration

– Evaluate all possible changes– Apply change based on score

• Stop when no modification improves score

Bayesian Networks

Bayesian Networks


S C

E

D

Add C

D

S C

E

D

- Learning

- Learning





Bayesian Networks

Bayesian Networks


S C

E

D

Reverse C

E

Add C

D

S C

E

D

- Learning

- Learning





S C

E

D

Bayesian Networks

Bayesian Networks


S C

E

D

Reverse C

EDelete C

E

Add C

D

S C

E

D

S C

E

D

- Learning

- Learning





S C

E

D

Bayesian Networks

Bayesian Networks


S C

E

D

Reverse C

EDelete C

E

Add C

D

S C

E

D

S C

E

D

If data is complete:To update score after local change, only re-score (counting) families that changed

- Learning

- Learning

S C

E

D

If data is incomplete:To update score after local change, reran parameter estimation algorithm

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07

• Local search can get stuck in:– Local Maxima:

•All one-edge changes reduce the score– Plateaux:

•Some one-edge changes leave the score unchanged

• Standard heuristics can escape both– Random restarts– TABU search– Simulated annealing

Local Search in Practice

- Learning

- Learning

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Local Search in Practice• Using LL as score, adding arcs always helps

– Max score attained by fully connected network– Overfitting: A bad idea…

• Minimum Description Length:– Learning data compression

• Other: BIC (Bayesian Information Criterion), Bayesian score (BDe)

||2

log),|(log)|( Θ+Θ−=

NGDPDBNMDL

DL(Model)

DL(Data|model)

<9.7 0.6 8 14

18> <0.2 1.3 5 ?? ??

> <1.3 2.8 ?? 0 1

> <?? 5.6 0 10 ??

> ……………….

- Learning

- Learning

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Local Search in Practice

G1G3 G2

Parametric optimization

(EM)

Parameter space

Local Maximum

G4Gn

• Perform EM for each candidate graph

- Learning

- Learning

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Local Search in Practice

• Perform EM for each candidate graphG1G3 G2

Parametric optimization

(EM)

Parameter space

Local Maximum

G4Gn

Computationally expensive: Parameter optimization via EM — non-trivial Need to perform EM for all candidate structures Spend time even on poor candidates

In practice, considers only a few candidates

- Learning

- Learning

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07

Structural EM [Friedman et al. 98]

Recall, in complete data we had–Decomposition efficient search

Idea: • Instead of optimizing the real score… •Find decomposable alternative

score•Such that maximizing new score

improvement in real score

- Learning

- Learning

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07


Idea: • Use current model to help evaluate new

structures

Outline:• Perform search in (Structure, Parameters)

space• At each iteration, use current model for

finding either:– Better scoring parameters: “parametric” EM

stepor– Better scoring structure: “structural” EM step

- Learning

- Learning

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07

TrainingData

Expected CountsN(X1)N(X2)N(X3)N(H, X1, X1, X3)N(Y1, H)N(Y2, H)N(Y3, H)

Computation

X1 X2 X3

H

Y1 Y2 Y3

X1 X2 X3

H

Y1 Y2 Y3

Score &

Parameterize

Reiterate

N(X2,X1)N(H, X1, X3)N(Y1, X2)N(Y2, Y1, H)

X1 X2 X3

H

Y1 Y2 Y3


- Learning

- Learning

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07

Structure Learning: incomplete data

EM-algorithm:iterate until convergence

Current model

S X D C B <? 0 1 0

1> <1 1 ? 0

1> <0 0 0 ? ?

> <? ? 0 ?

1> ………

Data

S X D C B 1 0 1 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 0 1 ………..

Expected counts

Expectation Inference: P(S|X=0,D=1,C=0,B=1)

Maximization Parameters

- Learning

- Learning

E

BA

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07

Structure Learning: incomplete data

SEM-algorithm:iterate until convergence

Current model

S X D C B <? 0 1 0

1> <1 1 ? 0

1> <0 0 0 ? ?

> <? ? 0 ?

1> ………

Data

S X D C B 1 0 1 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 0 1 ………..

Expected counts

Expectation Inference: P(S|X=0,D=1,C=0,B=1)

Maximization Parameters

- Learning

- Learning

MaximizationStructure

E B

A

E

B

A

E

BA

E

BA

E B

A

Bayesian Networks

Bayesian Networks

AdvancedI WS 06/07 Structure Learning: Summary

• Expert knowledge + learning from data• Structure learning involves parameter

estimation (e.g. EM) • Optimization w/ score functions

– likelihood + complexity penality = MDL• Local traversing of space of possible

structures:– add, reverse, delete (single) arcs

• Speed-up: Structural EM– Score candidates w.r.t. current best

model

- Learning

- Learning

Graphical Models - Learning - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP.

Documents

b b e bepa e

values e b

structure learning slide

structure search

incomplete data e b

network structure

scorebased learning

data score learning