Top Banner
Exact Inference: Clique Trees Eran Segal Weizmann Institute
35

Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Exact Inference:Clique Trees

Eran Segal Weizmann Institute

Page 2: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Inference with Clique Trees Exploits factorization of the distribution for

efficient inference, similar to variable elimination

Uses global data structures

Distribution given by (possibly unnormalized) measure

For Bayesian networks, factors are CPDs For Markov networks, factors are potentials

F

FP'

')(

U

Page 3: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Variable Elimination & Clique Trees

Variable elimination Each step creates a factor i through multiplication A variable is then eliminated in i to generate new

factor j Process repeated until product contains only query

variables

Clique tree inference Another view of the above computation General idea: i is a computational data structure

which takes “messages” j generated by other factors j and generates a message i which is used by another factor l

Page 4: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Cluster Graph Data structure providing flowchart of the factor

manipulation process

A cluster graph K for factors F is an undirected graph Nodes are associated with a subset of variables CiU The graph is family preserving: each factor F is

associated with one node Ci such that Scope[]Ci Each edge Ci–Cj is associated with a sepset Si,j = Ci Cj

Key: variable elimination defines a cluster graph Cluster Ci for each factor i used in the computation Draw edge Ci–Cj if the factor generated from i is used in

the computation of j

Page 5: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Simple Example

X1 X2 X3

1 2

),,()( 3213X X

XXXPXP

1 2

)|()|()( 23121X X

XXPXXPXP

2 1

)|()()|( 12123X X

XXPXPXXP

2

)()|( 223X

XXXP

)( 3X

Variable elimination

X1,X2 X2,X3

Cluster graph

C1 = {X1,X2}

C2 = {X2,X3}

S1,2= {X2}

X2

P(X1)P(X2|X1) P(X3|X2)

Page 6: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

A More Complex Example Goal: P(J) Eliminate: C,D,I,H,G,S,L

C: D: I: H: G: S: L:

C

D I

SG

L

JH

C,D G,I,DD

G,S,I

G,J,S,L J,S,L

H,G,J

J,L

G,I

G,S

G,J

J,LJ,S,L

C

DC DCCDf ),()()(1

D

G DfDIGIGf )(),,(),( 12

I

SI IGfISISGf ),(),()(),( 23

H

H JGHJGf ),,(),(4

G

L JGfSGfGLSLJf ),(),(),(),,( 435

S

SLJfSLJLJf ),,(),,(),( 56

L

LJfJf ),()( 67

1 2

3

4

5 6 7

Page 7: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Properties of Cluster Graphs Cluster graphs are trees

In VE, each intermediate factor is used only once Hence, each cluster “passes” an edge (message) to

exactly one other cluster Cluster graphs obey the running intersection

property If XCi and XCj then X is in each cluster in the

(unique) path between Ci and Cj C,D G,I,D

DG,S,I

G,J,S,L J,S,L

H,G,J

J,L

G,I

G,S

G,J

J,LJ,S,LVerify:

Tree and family preserving

Running intersection property

Page 8: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Running Intersection Property Theorem: If T is a cluster tree induced by VE over

factors F, then T obeys the running intersection

Proof: Let C and C’ be two clusters that contain X Let CX be the cluster where X is eliminated X must be present on each cluster on C to CX path

Computation at CX must be after computation at C X is in C by assumption and since X is not eliminated in C, then

X is in the factor generated by C By definition, C’s neighbor multiplies factor generated by C

and thus multiplies X and has X in its scope By induction for all other nodes on the path X appears in all cliques between C and CX

Page 9: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Clique Tree A cluster tree over factors F that satisfies the

running intersection property is called a clique tree

Clusters in a clique tree are also called cliques

We saw, variable elimination clique tree Now we will see clique tree variable elimination

Clique tree advantage: data structure for caching computations allowing multiple VE runs to be performed more efficiently than separate VE runs

Page 10: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Clique Tree InferenceC

D I

SG

L

JH

Verify:

Tree and family preserving

Running intersection property

Goal: Compute P(J)

C,D G,I,DD

G,S,I G,J,S,L H,G,JG,I G,S G,J

P(C)P(D|C)

P(G|I,D) P(I)P(S|I)

P(L|G)P(J|L,S)

P(H|G,J)

1 2 3 45

Page 11: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Clique Tree InferenceC

D I

SG

L

JH

C,D G,I,DD

G,S,I G,J,S,L H,G,JG,I G,S G,J

P(C)P(D|C)

P(G|I,D) P(I)P(S|I)

P(L|G)P(J|L,S)

P(H|G,J)

Goal: Compute P(J) Set initial factors at each cluster as products C1: Eliminate C, sending 12(D) to C2

C2: Eliminate D, sending 23(G,I) to C3

C3: Eliminate I, sending 35(G,S) to C5

C4: Eliminate H, sending 45(G,J) to C5

C5: Obtain P(J) by summing out G,S,L

1 2 3 45

Page 12: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Clique Tree InferenceC

D I

SG

L

JH

C,D G,I,DD

G,S,I G,J,S,L H,G,JG,I G,S G,J

P(C)P(D|C)

P(G|I,D) P(I)P(S|I)

P(L|G)P(J|L,S)

P(H|G,J)

Goal: Compute P(J) Set initial factors at each cluster as products C1: Eliminate C, sending 12(D) to C2

C2: Eliminate D, sending 23(G,I) to C3

C3: Eliminate I, sending 35(G,S) to C5

C5: Eliminate S,L, sending 54(G,J) to C4

C4: Obtain P(J) by summing out H,G

1 2 3 45

Page 13: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Clique Tree InferenceC

D I

SG

L

JH

C,D G,I,DD

G,S,I G,J,S,L H,G,JG,I G,S G,J

1 2 3 45

P(C)P(D|C)

P(G|I,D) P(I)P(S|I)

P(L|G)P(J|L,S)

P(H|G,J)

Page 14: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Clique Tree Message Passing Let T be a clique tree and C1,...Ck its cliques

Multiply factors of each clique, resulting in initial potentials as each factor is assigned to some clique () then we have and

Define Cr as the root cluster Start from tree leaves and move inward For each clique Ci define Ni to be the neighbors of Ci Let pr(i) be the upstream neighbor of i (on the path to Cr)

Each Ci performs a computation that sends message to Cpr(i)

Multiply all incoming messages from downstream neighbors with the initial clique potential resulting in a factor whose scope is the clique

Sum out all variables except those in the sepset Ci—Cpr(i)

Claim: the final clique potential r[Cr] represents

j

jj C)(:

0 ][

k

jjj C

1

0 ][

rCU

rF CP

)(

Page 15: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Example

C1 C4

C3

C2

C5C6

Root C6

Legal ordering I: 1,2,3,4,5,6 Legal ordering II: 2,5,1,3,4,6 Illegal ordering: 3,4,1,2,5,6

Page 16: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Clique Tree Inference Correctness

Theorem Let Cr by the root clique in a clique tree If r is computed as above, then

Algorithm applies to Bayesian and Markov networks For Bayesian network G, if F consists of the CPDs reduced

with some evidence e then r[Cr] = PG(Cr,e) Probability obtained by normalizing the factor over Cr to sum to 1

For Markov network H, if F consists of the compatibility functions then r[Cr] = PH(Cr)

Probability obtained by normalizing the factor over Cr to sum to 1 Partition function obtained by summing up all entries in r[Cr]

rCUFrr PC )(][ U

Page 17: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Clique Tree Calibration Assume we want to compute posterior

distributions over n variables With variable elimination, we perform n separate VE

runs With clique trees, we can do this much more

efficiently Idea 1: since marginal over a variable can be computed

from any root clique that includes it, perform k clique tree runs (k=# cliques)

Idea 2: Can do much better!

Page 18: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Clique Tree Calibration Observation: a message from Ci to Cj is unique

Consider two neighboring cliques Ci and Cj

If root Cr is on Cj side, Ci sends Cj a message Message does not depend on specific Cr (we only need

Cr to be on the Cj side for Ci to send a message to Cj) Message from Ci to Cj will always be the same

Each edge has two messages associated with it One message for each direction of the edge There are only 2(k-1) messages to compute Can then readily compute posterior over each variable

Page 19: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Clique Tree Calibration Compute 2(k-1) messages by

Pick any node as the root Upward pass: send messages to the root

Terminate when root received all messages Downward pass: send messages to root children

Terminate when all leaves received messages

Page 20: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Clique Tree Calibration: Example

Root: C5 (first downward pass) C

D I

SG

L

JH

C,D G,I,DD

G,S,I G,J,S,L H,G,JG,I G,S G,J

1 2 3 45

P(C)P(D|C)

P(G|I,D) P(I)P(S|I)

P(L|G)P(J|L,S)

P(H|G,J)

Page 21: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Clique Tree Calibration: Example

Root: C5 (second downward pass) C

D I

SG

L

JH

C,D G,I,DD

G,S,I G,J,S,L H,G,JG,I G,S G,J

1 2 3 45

P(C)P(D|C)

P(G|I,D) P(I)P(S|I)

P(L|G)P(J|L,S)

P(H|G,J)

Page 22: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Clique Tree Calibration Theorem

i is computed for each clique i as above

Important: avoid double-counting! Each node i computes the message to its neighbor j

using its initial potentials and not its updated potential i, since j integrates information from Cj which will be counted twice

iCUFii PC )(][ U

Page 23: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Clique Tree Calibration The posterior of each node X can be computed by

eliminating the redundant variables from a clique that contains X

If X appears in multiple cliques, they must agree

A clique tree with potentials i[Ci] is said to be calibrated if for all neighboring cliques Ci and Cj:

Key advantage: compute posteriors for all variables using only twice the computation of one upward pass

jijjii SC

jjSC

ii CC,,

][][

Page 24: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Distribution of Calibrated Tree

A B C A,B B,CB

Bayesian network Clique tree

For calibrated tree

Joint distribution can thus be written as

AC

BA

CB

CB

CB

BP

CB

BP

CBPBCP

],[

],[

],[

],[

)(

],[

)(

),()|(

1

2

2

22

C

CB

CBBABCPBAPCBAP

],[

],[],[)|(),(),,(

2

21

Page 25: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Distribution of Calibrated Tree The clique tree measure of a calibrated tree T

is

where

TCC jiji

TC ii

r

ji

i

S

C

)( ,, )(

][

jijjii SC

jjSC

iijiji CCS,,

][][)( ,,

Page 26: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Distribution of Calibrated Tree Theorem: If T is a calibrated clique tree where

for each Ci we have i[Ci]=P(Ci) then P(U)=r

Proof: Let r be some arbitrarily chosen root in the clique

tree Since clique tree is a tree we have Ind(Ci;X|Si,pr(i)

)

Thus,

We can now write

Since each clique and each sepset appear exactly once:

TCC jiji

TC ii

ji

i

S

CP

)( ,, )(

][)(

U

ri ipiir rSCPCPP )|()()( )(,U

ji

ii

ipi

iipii

C

SP

CPSCP

r

r

,)(,)(,

][

)(

)()|(

Page 27: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Message Passing: Belief Propagation

Recall the clique tree calibration algorithm Upon calibration the final potential at i is:

A message from i to j sums out the non-sepset variables from the product of initial potential and all other messages

Can also be viewed as multiplying all messages and dividing by the message from j to i

ii Nk iki 0

}{

0

, jNk ikSCjiijii i

ij

SC i

ij

Nk ikSC

jijiiijii i

,,

0

Page 28: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Message Passing: Belief Propagation

X1 X2 X3 X1,X2 X2,X3

X2

Bayesian network Clique tree

Root: C2

C1 to C2 Message: C2 to C1 Message: Alternatively compute And then:

Thus, the two approaches are equivalent

],[)()(],[ 3202323221322 XXXXXX

X4 X3,X4

X3

)|()(],[)( 121210

22111 1

XXPXPXXXXX

)(],[)( 323320

2123 2

XXXXX

33

3 )(],[)(

],[

)(

],[)( 32332

02

221

322

221

322

212XX

X XXXX

XX

X

XXX

Page 29: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Message Passing: Belief Propagation

Based on the observation above, belief propagation Different message passing scheme Each clique Ci maintains its fully updated beliefs i

product of initial messages and messages from neighbors

Store at each sepset Si,j the previous message i,j passed regardless of the direction

When passing a message, divide by previous i,j

Claim: message passing is correct regardless of the clique that sent the last message

This is called belief update or belief propagation

Page 30: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Message Passing: Belief Propagation

Initialize the clique tree For each clique Ci set For each edge Ci—Cj set

While unset cliques exist Select Ci—Cj

Send message from Ci to Cj

Marginalize the clique over the sepset

Update the belief at Cj

Update the sepset at Ci–Cj

ii )(:

1, ji

jii SC iji,

jiji ,

ji

jijj

,

Page 31: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Clique Tree Invariant Belief propagation can be viewed as

reparameterizing the joint distribution Upon calibration we showed

Initially this invariant holds since

At each update step invariant is also maintained Message only changes i and i,j so most terms remain

unchanged We need to show

But this is exactly the message passing step

Belief propagation reparameterizes P at each step

TCC jiji

TC ii

ji

i

S

CP

)( ,, )(

][)(

U

)(1)(

][

)( ,,

UPS

CF

TCC jiji

TC ii

ji

i

ji

i

ji

i

,,'

'

ji

ijii

,

,''

Page 32: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Answering Queries Posterior distribution queries on variable X

Sum out irrelevant variables from any clique containing X Posterior distribution queries on family X,Pa(X)

Sum out irrelevant variables from clique containing X,Pa(X)

Introducing evidence Z=z Compute posterior of X where X appears in clique with Z

Since clique tree is calibrated, multiply clique that contains X and Z with indicator function I(Z=z) and sum out irrelevant variables

Compute posterior of X if X does not share a clique with Z Introduce indicator function I(Z=z) into some clique containing Z

and propagate messages along path to clique containing X Sum out irrelevant factors from clique containing X

Page 33: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Constructing Clique Trees Using variable elimination

Create a cluster Ci for each factor used during a VE run

Create an edge between Ci and Cj when a factor generated by Ci is used directly by Cj (or vice versa)

We showed that cluster graph is a tree satisfying the running intersection property and thus it is a legal clique tree

Page 34: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

Constructing Clique Trees Goal: construct a tree that is family preserving and

obeys the running intersection property

Triangulate the graph to construct a chordal graph H NP-hard to find triangulation where the largest clique in the

resulting chordal graph has minimum size Find cliques in H and make each a node in the graph

Finding maximal cliques is NP-hard Can start with families and grow greedily

Construct a tree over the clique nodes Use maximum spanning tree on an undirected graph whose

nodes are maximal cliques and edge weight is |C iCj| Can show that resulting graph obeys running intersection

Page 35: Exact Inference: Clique Trees Eran Segal Weizmann Institute.

ExampleC

D I

SG

L

JH

C

D I

SG

L

JH

One possible triangulation

C

D I

SG

L

JH

MoralizedGraph

C,D G,I,D G,S,I G,S,L L,S,J1 2 2 2

G,H 111

Cluster graph with edge weights

11

C,D

G,I,D

G,S,I G,S,L L,S,J

G,H