Top Banner
Copyright (c) 2002 by SNU CSE Biointelligence Lab . 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29
90

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Jan 04, 2016

Download

Documents

Beatrix Bryant
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1

SURVEY: Foundations of Bayesian Networks

O, Jangmin

2002/10/29

Last modified 2002/10/29

Page 2: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 2

Contents

• From DAG to Junction TreeFrom DAG to Junction Tree• From Elimination Tree to Junction Tree• Junction Tree Algorithms• Learning Bayesian Networks

Page 3: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 3

Typical Example of DAG

A

B C

F

DG

Simple DAG

Page 4: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 4

1. Topological Sort

Algorithm 4.1 [Topological sort]• Begin with all vertices unnumbered.• Set counter i = 1.• While any vertices remain:

– Select any vertex that has no parents;– number the selected vertex as i;– delete the numbered vertex and all its adjacent edges from

the graph;– increment i by 1.

Objective: acquiring well-orderingWell-ordering: predecessors of any node have lower number than .

Page 5: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 5

1. Topological Sort (1)

A

B C

F

DG

Simple DAG

1

Page 6: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 6

1. Topological Sort (2)

A

B C

F

DG

Simple DAG

1

2

Page 7: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 7

1. Topological Sort (3)

A

B C

F

DG

Simple DAG

1

2 3

Page 8: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 8

1. Topological Sort (4)

A

B C

F

DG

Simple DAG

1

2 3

4

Page 9: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 9

1. Topological Sort (5)

A

B C

F

DG

Simple DAG

1

2 3

4

5

Page 10: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 10

1. Topological Sort (6)

A

B C

F

DG

Simple DAG

1

2 3

4

5

6

Page 11: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 11

2. Moral Graph

• Making moral graph of DAG– Add undirected edge between the nodes which

have same child.– Remove directions

Page 12: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 12

2. Moral Graph (1)

A

B C

F

DG

Simple DAG

1

2 3

4

5

6

Page 13: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 13

2. Moral Graph (2)

A

B C

F

DG

Simple DAG

1

2 3

4

5

6

Page 14: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 14

Junction tree

• Definition– Tree from nodes C1, C2,...

– Intersection of C1 and C2 is contained in every node on path between C1 and C2.

• Corollaries– Decomposable, chordal, junction tree of cliques,

perfect numbering: all are equal in undirected graph.

Perfect numbering: ne(vj) {v1, ..., vj-1} induce complete subgraph.

Page 15: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 15

3. Maximum Cardinality Search (1)

Algorithm 4.9 [Maximum Cardinality Search]• Set Output := ‘G is chordal’.• Set counter i := 1.• Set L = .• For all v V, set c(v) := 0.• While L V:

– Set U := V \ L.– Select any vertex v maximizing c(v) over v V and label it i.– If vi :=ne(vi) L is not complete in G:

Set Output :=‘G is not chordal’.– Otherwise, set c(w) = c(w) + 1 for each vertex w ne(vi) U.– Set L = L {vi}.– Increment i by 1.

• Report Output.

Page 16: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 16

3. Maximum Cardinality Search (2)

A

B C

F

DG

Simple DAG

Page 17: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 17

3. Maximum Cardinality Search (2)

A

B C

F

DG

1, ={}

..

.

Page 18: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 18

3. Maximum Cardinality Search (3)

A

B C

F

DG

1, =

..

..

2, ={A}

.

Page 19: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 19

3. Maximum Cardinality Search (4)

A

B C

F

DG

1, =

..

2, ={A}

..

3, ={A, B}

Page 20: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 20

3. Maximum Cardinality Search (5)

A

B C

F

DG

1, =

2, ={A}

..

3, ={A, B}

4, ={A, B}

Page 21: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 21

3. Maximum Cardinality Search (6)

A

B C

F

DG

1, =

2, ={A}

.

3, ={A, B}

4, ={A, B}

5, ={B, C}

Page 22: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 22

3. Maximum Cardinality Search (7)

A

B C

F

DG

1, =

2, ={A} 3, ={A, B}

4, ={A, B}

5, ={B, C}

6, ={F}

Page 23: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 23

3. Maximum Cardinality Search (8)

A

B C

F

DG

1, =

2, ={A} 3, ={A, B}

4, ={A, B}

5, ={B, C}

6, ={F}

Output = “G is chordal”

Page 24: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 24

4. Cliques of Chordal Graph (1)

Algorithm 4.11 [Finding the Cliques of a Chordal Graph]• From numbering (v1,..., vk) obtained by maximum cardinality s

earch i = cardinality of vi

• Make ladder nodes. i = ladder node if i = k

or i = ladder node if i < k and i+1 < 1 + i

• Define cliques– Cj = {j} j

C1, C2... Posess RIP (running intersection property).

Page 25: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 25

4. Cliques of Chordal Graph (2)

A

B C

F

DG

1, =

2, ={A} 3, ={A, B}

4, ={A, B}

5, ={B, C}

6, ={F}

C1 = {A, B, C}

C2 = {A, B, D}

C3 = {B, C, F}

C4 = {F, G}

Page 26: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 26

Running Intersection Property

• RIP : definition– Given (C1, C2, ..., Ck),– For all 1 < j k, there is an i < j such that Cj (C1 ... Cj-1) Ci.

Page 27: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 27

5. Junction Tree Construction (1)

Algorithm 4.8 [Junction Tree Construction]• From the cliques (C1, ..., Cp) of a chordal graph ordered with

RIP,• Associate a node of the tree with each clique Cj.

• For j = 2, ..., p, add an edge between Cj and Ci where i is any one value in {1, ..., j-1} such that Cj (C1 ... Cj-1) Ci.

Page 28: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 28

5. Junction Tree Construction (2)

A

B C

F

DG

1, =

2, ={A} 3, ={A, B}

4, ={A, B}

5, ={B, C}

6, ={F}

C1 = {A, B, C}

C2 = {A, B, D}

C3 = {B, C, F}

C4 = {F, G}

ABC

ABD

BCF

FG

C1

C2

C3

C4

Page 29: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 29

5. Junction Tree Construction (3)

A

B C

F

DG

1, =

2, ={A} 3, ={A, B}

4, ={A, B}

5, ={B, C}

6, ={F}

C1 = {A, B, C}

C2 = {A, B, D}

C3 = {B, C, F}

C4 = {F, G}

ABC

ABD

BCF

FG

C1

C2

C3

C4

Page 30: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 30

5. Junction Tree Construction (4)

A

B C

F

DG

1, =

2, ={A} 3, ={A, B}

4, ={A, B}

5, ={B, C}

6, ={F}

C1 = {A, B, C}

C2 = {A, B, D}

C3 = {B, C, F}

C4 = {F, G}

ABC

ABD

BCF

FG

C1

C2

C3

C4

Page 31: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 31

5. Junction Tree Construction (5)

A

B C

F

DG

1, =

2, ={A} 3, ={A, B}

4, ={A, B}

5, ={B, C}

6, ={F}

C1 = {A, B, C}

C2 = {A, B, D}

C3 = {B, C, F}

C4 = {F, G}

ABC

ABD

BCF

FG

C1

C2

C3

C4

Page 32: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 32

Contents

• From DAG to Junction Tree• From Elimination Tree to Junction From Elimination Tree to Junction

TreeTree• Junction Tree Algorithms• Learning Bayesian Networks

Page 33: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 33

Triangulation (1)

• When need triangulation?– If MCS (Maximum Cardinality Search)

failed.

• Triangulation– introduces Fill-in.– produces perfect numbering.

• Optimal triangulation: NP-hard– Size of each cliques matters...

Page 34: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 34

Triangulation (2)

Algorithm 4.13 [One-step Look Ahead Triangulation]• Start with all vertices unnumbered, set counter i := k.• While there are still some unnumbered vertices:

– Select an unnumbered vertex v to optimize the criterion c(v). or– Select v = (i) [ is an order].– Label it with the number i.– Form the set Ci consisting of vi and its unnumbered neighbours.

– Fill in edges where none exist between all pairs of vertices in Ci.

– Eliminate vi and decrement i by 1.

Page 35: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 35

Triangulation (3)

A

B C

F

DG

= (A,B,C,D,F,G)

6, C6 = {F, G}

Page 36: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 36

Triangulation (4)

A

B C

F

DG

= (A,B,C,D,F,G)

6, C6 = {F, G}

5, C5 = {B,C,F}

Page 37: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 37

Triangulation (5)

A

B C

F

DG

= (A,B,C,D,F,G)

6, C6 = {F, G}

5, C5 = {B,C,F}

4, C4 = {A,B,D}

Page 38: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 38

Triangulation (6)

A

B C

F

DG

= (A,B,C,D,F,G)

6, C6 = {F, G}

5, C5 = {B,C,F}

4, C4 = {A,B,D}

3, C3 = {A,B,C}

Page 39: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 39

Triangulation (7)

A

B C

F

DG

= (A,B,C,D,F,G)

6, C6 = {F, G}

5, C5 = {B,C,F}

4, C4 = {A,B,D}

3, C3 = {A,B,C}

2, C2 = {A,B}

Page 40: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 40

Triangulation (8)

A

B C

F

DG

= (A,B,C,D,F,G)

6, C6 = {F, G}

5, C5 = {B,C,F}

4, C4 = {A,B,D}

3, C3 = {A,B,C}

2, C2 = {A,B}

1, C1 = {A} Elimination set• Cj contains vj.

• vj Cl for all l < j.

• (C1,..., Ck) has RIP.• The cliques of the triangulat

ed graph G’ are contained in (C1,..., Ck).

Page 41: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 41

Elimination Tree Construction (1)

Algorithm 4.14 [Elimination Tree Construction]• Associate a node of the tree with each set Ci.

• For j = 1, ..., k, if Cj contains more than one vertex, add an edge between Cj and Ci where i is the largest index of a vertex in Cj \ {vj}

Page 42: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 42

Elimination Tree Construction (2)

A:

B:A C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

C2

C1

Page 43: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 43

Elimination Tree Construction (3)

A:

B:A C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

C2

C1

Page 44: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 44

Elimination Tree Construction (4)

A:

B:A C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

C2

C1

Page 45: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 45

Elimination Tree Construction (5)

A:

B:A C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

C2

C1

Page 46: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 46

Elimination Tree Construction (6)

A:

B:A C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

C2

C1

Page 47: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 47

Elimination Tree Construction (7)

A:

B:A C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

C2

C1

Page 48: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 48

From etree to jtree (1)

Lemma 4.16– Let C1,..., Ck be a sequence of sets with RIP

– Assume that Ct Cp for some t p and that p is minimal with this property for fixed t. Then:

(i) If t > p, then C1, ..., Ct-1, Ct+1, ..., Ck has the running intersection property

(ii) If t < p, then C1,..., Ct-1, Cp, Ct+1, ..., Cp-1, Cp+1,..., Ck has the RIP.

Simple removal of redundant elimination set might lead to destroy RIP.

Page 49: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 49

From etree to jtree (2)

A:

B:A C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

C2

C1

Condition (ii): t = 1, p = 2

B:A C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

C2

Page 50: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 50

From etree to jtree (3)

Condition (ii): t = 2, p = 3

B:A C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

C2

C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

Page 51: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 51

MST for making jtree (1)

Algorithm• From Elimination set (C1, ..., Ck)

• Remove redundant Cis• Make junction graph.

– If |Ci Cj | > 0 add edge between Ci and Cj.

– Set weight of the edge as |Ci Cj |.

• Construct MST (Maximum Weight Spanning Tree)

The resulting tree is junction tree. Also the clique set has RIP.

Page 52: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 52

MST for making jtree (2)

ABC

BCFABD

FG

2 2

1

1

ABC

BCFABD

FG

2 2

1

Junction graph MST

C1

C2

C3

C4

Page 53: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 53

MST for making jtree (3)

• Optimal jtree (for a fixed elimination ordering)– cost of edge e = (v, w)

– Use cost of edge to break tie when constructing MST. (minimum preferred)

on. can take valuesdiscrete of # :

)(

ii

vi iv

wvwv

Xq

qq

qqqe

Page 54: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 54

Contents

• From DAG to Junction Tree• From Elimination Tree to Junction Tree• Junction Tree AlgorithmsJunction Tree Algorithms• Learning Bayesian Networks

Page 55: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 55

Collect phase

jji

jij μ

)(childjkjkj Sμ

Ck

Cj

Ci Ci’

• From leaf to root

separator

projection

Initial potential

Updated potential

Page 56: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 56

Distribute phase

• From root to leaf j

* contains marginal distribution of clique j.

ji

ijjijjk

iijchildijiij

jkjj

SSμμ

μ

*

'),(''

*

Ck

Cj

Ci Ci’

Page 57: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 57

Contents

• From DAG to Junction Tree• From Elimination Tree to Junction Tree• Junction Tree Algorithms• Learning Bayesian NetworksLearning Bayesian Networks

Page 58: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 58

Learning Paradigm

• Known structure or unknown structure• Full observability or partial observability• Frequentist or Bayesian

Page 59: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 59

Ks, Fo, Fr (1)

• Given training set D = {D1, ..., DM}

• MLE of parameters of each CPD– MLE (Maximum likelihood Estimates)– CPD (Conditional Probability Distribution)

M

m

n

i

M

mmiim DXPaXPGDL

1 1 1

)),(|(log)|Pr(log

Decomposition, for each node# of nodes

# of data

Page 60: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 60

Ks, Fo, Fr (2)

• Multinomial distributions– , for tabular CPD– Log-likelihood

– MLE

))(|(def

jXPakXP iiijk

ijkijk

ijk

i m kjijkijkm

i m kj

Iijk

N

I

L ijkm

log

log

log

,

,)|)(,(

def

miiijkm DjXPakXII

m

miiijk DjXPakXIN )|)(,(def

' '

ˆ

k ijk

ijkijk N

N constraint: ji

k ijk , allfor 1

Page 61: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 61

Ks, Fo, Fr (3)

• MLE of Multinomial distr.– Constrained optimization

ij k

ijkijijkijk

ijkNO )1(log

ijijk

ijk

ijk

N

d

dO

ijkijijkN

k

ijkijk

ijkN

ijk

ijkN

''

ˆ

kijk

ijkijk N

N

Derivatives of ijk

Setting Derivatives of ijk zero

Page 62: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 62

Ks, Fo, Fr (4)

• Conditional linear Gaussian distributions

Page 63: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 63

Ks, Fo, Ba (1)

• Frequentist: point estimation• Bayesian: distributional estimation

Page 64: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 64

Ks, Fo, Ba (2)

• Multinomial distributions– Two assumptions on prior

• Global independence:

• Local independence:

– Global independence + likelihood equivalence leads to Dirichlet prior: Conjugate prior for multinomial

},...,1,,...,1,{ ,)(1 iiijki

n

i i rkqjP

},...,1,{ ,)(1 iijkij

q

j iji rkP i

Page 65: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 65

Ks, Fo, Ba (3)

• Remark on Bayesian– P(|D) P(D| )*P()

– Conjugate priors• Posterior has same form with prior distribution.• Many exponential family belongs to conjugate

priors.

PosteriorLikelihood

Prior

Page 66: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 66

Ks, Fo, Ba (4)

• Multinomial distributions– Dirichlet prior on tabular CPDs

ij: multinomial r.v. with ri possible values

• Posterior distribution

• Posterior mean

))(|( jXPaXP iiij

),...,(~ 1 iijrijij Dirichlet

i

i

ijk

r

k ijrijijkijij B

P1 1

1

),...,(

1)|(

1

1 ),...,(

k k

k kB

)!1()( nn

),...,(~| 11 ii ijrijrijijij NNDirichletD

ir

l ijlijl

ijkijkijk

N

NDE

1

]|[

Page 67: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 67

Ks, Fo, Ba (5)

• Dirichlet distribution– Hyper parameter ijk

• Positive number • Pseudo count• # of imaginary cases ijk - 1

– Posterior distribution• Combined count between pseudo count and # of obser

ved data• Simple sum

),...,(~ 1 iijrijij Dirichlet

),...,(~| 11 ii ijrijrijijij NNDirichletD

Page 68: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 68

Ks, Fo, Ba (6)

• Gaussian distributions

Page 69: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 69

Ks, Po, Fr (1)

• Log likelihood

• Not decomposable into a sum of local terms, one per node– EM algorithm

m hm

mm

DVhHP

DPL

),(log

)(loghidden

visible (observed)

Page 70: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 70

Ks, Po, Fr (2)

• EM algorithm– From Jensen’s inequality

1),log()log( j

j jjjj

jj yy

m hmm

m hmm

m h m

mm

m h m

mm

m hm

VhqVhqVhHPVhq

Vhq

VhHPVhq

Vhq

VhHPVhq

VhHPL

)|(log)|(),(log)|(

)|(

),(log)|(

)|(

),()|(log

),(log

1)|( h mVhqconstraint:

Page 71: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 71

Ks, Po, Fr (3)

– Maximizing w.r.t. q (E-step)

m hmmh

m hmm

m hmhm

Vhq

VhqVhqVHPVhqO

))|(1(

)|(log)|(),(log)|(

mhmmhm

VhqVHPVhdq

dO 1)|(log),(log)|(

mhe

VHPVhq mh

m

1

),()|(

h

mhh

m VHPe

Vhqmh

),(1

)|( 1

)(),(1m

hmh VPVHPe mh

)|()|( mm VhPVhq

Page 72: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 72

Ks, Po, Fr (4)

– Maximizing w.r.t (M-step)• After q is maximized to p(h|Vm)• Maximizing Expected complete-data log-likelihood

• Iteration until convergence– E-step

• Calculate expected complete-data log-likelihood– M-step

• Get * maximizing expected complete-data log-likelihood

m h

mm VhHPVhpQ )'|,(log),|()|'(

)|'(maxarg*'

Q

Page 73: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 73

Ks, Po, Fr (5)

• Multinomial distribution– E-step

– M-step

ijk

ijkijkNEQ 'log][)|'( ijkijk

ijkNL log

)|)(,(def

miiijkm DjXPakXII

m

miiijk DjXPakXIN )|)(,(def

mmiiijk DjXPakXPNE ),|)(,(][

)|'(maxarg'

Q

''][

][

kijk

ijkijk NE

NE

Page 74: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 74

Ks, Po, Ba (1)

• Gibbs sampling: stochastic version of EM• Variational Bayes: P(, H|V) q(|V)q(H|V)

Page 75: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 75

Us, Fo, Fr (1)

• Issues– Hypothesis space– Evaluation function– Search algorithm

Page 76: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 76

Us, Fo, Fr (2)

• Search space– DAG

• # of DAGs ~ O(2n^2)• 10 nodes ~ O(1018) DAGs• Finding optimal DAG: doomed to failure

Page 77: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 77

Us, Fo, Fr (3)

• Search algorithm– Local search

• Operators: adding, deleting, reversing a single arcChoose G somehow

While not convergedFor each G’ in nbd(G)

Compute score(G’)G* := arg maxG’ score(G’)

If score(G*) > score(G)then G :=G*

else converged := true Psedo-code for hill-climbing. nbd(G) is the neighborhood of G, i.e., the

models that can be reached by applying a single local change operator.

Page 78: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 78

Us, Fo, Fr (4)

• Search algorithm– PC algorithm

• Starts with fully connected undirected graph• CI (conditional independence) test

– If X Y|S, arc between X and Y is removed.

Page 79: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 79

Us, Fo, Fr (5)

• Scoring function– MLE selects fully connected graph.– score(G) P(D|G)P(G)

– Automatically penalizing effect on complex model.• has more parameters.• Not much probability mass to the space where data act

ually lies.

)(

)()|()|( model MAP

DP

GPGDPDGP

penalizing complex models

)|(),|()|()(score GPGDPGDPG

Page 80: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 80

Us, Fo, Fr (6)

• Scoring function– Under global independences, and

conjugate priors

– Integration at closed form

n

iii

n

iiiii

XXPa

PXPaXPGDPi

1

def

1

)),((score

)()),(|()|(

Decomposition as factored form

Page 81: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 81

Us, Fo, Fr (7)

• Scoring function– Under not conjugate priors: approximation– Laplace approximation: BIC (Bayesian Information

Criterioin)

– Case of multinomial distribution

Md

GDPGDP G log2

)ˆ,|(log)|(log

dim. of the model

ML estimate of params.

Md

N

Md

DXPaXPG

i

i jkijkijk

im

i miii

log2

log

log2

),ˆ),(|(log)(scoreBIC

Page 82: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 82

Us, Fo, Fr (8)

• Scoring function– Advantage of decomposed score– Marginal likelihood at most two different

terms in single link mismatched graphs.• Ex) G1:X1X2 X3 X4, G2:X1 X2X3 X4

),(score),(score),(score)(score

),(score)(score),,(score)(score

)|(

)|(

4332211

4333211

1

2

XXXXXXX

XXXXXXX

GDP

GDP

Page 83: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 83

Us, Fo, Fr (9)

• Scoring function– Marginal likelihood for the multinomial distributio

n with Dirichlet prior – Bayesian Dirichlet (BD) score

n

i

q

j

r

k

Nijk

i iijkGDPGDP

1 1 1

),|()|(

ii

i

i

ii

r

k ijk

ijkijkn

i

q

j ijij

ij

n

i

q

j ijrij

ijrijrijij

N

N

B

NNBGDP

11 1

1 1 1

11

)(

)(

)(

)(

),...,(

),...,()|(

posterior mean

Page 84: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 84

Us, Fo, Ba (1)

• Posterior over all models is intractable– Focusing on some features

• Bayesian model averaging

• Needs to calculate P(G|D)

– Solution MCMC: Metropolis-Hastings algorithm• Only need to ratio R. Integration is avoided.

G

GfDGPDfP )()|()|( f(G)=1 if G contains a certain edge

')'()'|(

)()|()|(

GGPGDP

GPGDPDGP

Integration is intractable.

)|(

)|(

)(

)(

)|(

)|(

1

2

1

2

1

2

GDP

GDP

GP

GP

DGP

DGP

Page 85: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 85

Us, Fo, Ba (2)

• Calculation of P(G|D)– Sampling GChoose G somehowWhile not converged

Pick a G’ u.a.r. from nbd(G)Compute R = P(G’|D)q(G|G’)/P(G|D)q(G’|G)Sample u ~ uniform(0,1)If u < min{1, R}

then G := G’

Psedo-code for MC3 algorithm. u.a.r. means uniformly at random.

Page 86: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 86

Us, Po, Fr (1)

• Partially observable– Computation of marginal likelihood:

Intractable– Not decomposable to the product of local

terms

– Solutions• Approximating the marginal likelihood• Structural EM

Z

GPGZVPGVP

)|(),|,()|(

hidden variables

Page 87: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 87

Us, Po, Fr (2)

• Approximating the marginal likelihood– Candidate’s method

),|(

)|(),|()|(

*

**

GDP

GPGDPGDP

G

GG

from Gibbs sampling

from BN’s inference algorithm

trivial

MLE of params.

Page 88: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 88

Us, Po. Fr (3)

• Structural EM– Idea: decomposition of expected complete-

data log-likelihood (BIC-score)– Search inside EM

• (EM inside Search is high cost process)

Md

NG i

i jkijkijk log

2log)(BICscore

Md

NG i

i jkijkijk log

2ˆlog)(EBICscore

MLE of params.

m

miiijk DjXPakXPN ),|)(,(

Page 89: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 89

Us, Po, Ba (1)

• Combined MCMC– MCMC for Bayesian model averaging– MCMC over the values of the unobserved

nodes.

Page 90: Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 90

Conclusion

• Has learning of structure important meaning?– In paper, Yes.– In engineering, No.

• What can AI do for human?• What can human do for Machine

learning algorithm?