Copyright (c) 2002 by SNU CSE Biointelligence Lab . 1 SURVEY: Foundations of Bayesian Networks O, Jangmin 2002/10/29 Last modified 2002/10/29
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1
SURVEY: Foundations of Bayesian Networks
O, Jangmin
2002/10/29
Last modified 2002/10/29
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 2
Contents
• From DAG to Junction TreeFrom DAG to Junction Tree• From Elimination Tree to Junction Tree• Junction Tree Algorithms• Learning Bayesian Networks
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 3
Typical Example of DAG
A
B C
F
DG
Simple DAG
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 4
1. Topological Sort
Algorithm 4.1 [Topological sort]• Begin with all vertices unnumbered.• Set counter i = 1.• While any vertices remain:
– Select any vertex that has no parents;– number the selected vertex as i;– delete the numbered vertex and all its adjacent edges from
the graph;– increment i by 1.
Objective: acquiring well-orderingWell-ordering: predecessors of any node have lower number than .
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 5
1. Topological Sort (1)
A
B C
F
DG
Simple DAG
1
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 6
1. Topological Sort (2)
A
B C
F
DG
Simple DAG
1
2
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 7
1. Topological Sort (3)
A
B C
F
DG
Simple DAG
1
2 3
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 8
1. Topological Sort (4)
A
B C
F
DG
Simple DAG
1
2 3
4
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 9
1. Topological Sort (5)
A
B C
F
DG
Simple DAG
1
2 3
4
5
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 10
1. Topological Sort (6)
A
B C
F
DG
Simple DAG
1
2 3
4
5
6
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 11
2. Moral Graph
• Making moral graph of DAG– Add undirected edge between the nodes which
have same child.– Remove directions
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 12
2. Moral Graph (1)
A
B C
F
DG
Simple DAG
1
2 3
4
5
6
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 13
2. Moral Graph (2)
A
B C
F
DG
Simple DAG
1
2 3
4
5
6
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 14
Junction tree
• Definition– Tree from nodes C1, C2,...
– Intersection of C1 and C2 is contained in every node on path between C1 and C2.
• Corollaries– Decomposable, chordal, junction tree of cliques,
perfect numbering: all are equal in undirected graph.
Perfect numbering: ne(vj) {v1, ..., vj-1} induce complete subgraph.
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 15
3. Maximum Cardinality Search (1)
Algorithm 4.9 [Maximum Cardinality Search]• Set Output := ‘G is chordal’.• Set counter i := 1.• Set L = .• For all v V, set c(v) := 0.• While L V:
– Set U := V \ L.– Select any vertex v maximizing c(v) over v V and label it i.– If vi :=ne(vi) L is not complete in G:
Set Output :=‘G is not chordal’.– Otherwise, set c(w) = c(w) + 1 for each vertex w ne(vi) U.– Set L = L {vi}.– Increment i by 1.
• Report Output.
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 16
3. Maximum Cardinality Search (2)
A
B C
F
DG
Simple DAG
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 17
3. Maximum Cardinality Search (2)
A
B C
F
DG
1, ={}
..
.
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 18
3. Maximum Cardinality Search (3)
A
B C
F
DG
1, =
..
..
2, ={A}
.
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 19
3. Maximum Cardinality Search (4)
A
B C
F
DG
1, =
..
2, ={A}
..
3, ={A, B}
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 20
3. Maximum Cardinality Search (5)
A
B C
F
DG
1, =
2, ={A}
..
3, ={A, B}
4, ={A, B}
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 21
3. Maximum Cardinality Search (6)
A
B C
F
DG
1, =
2, ={A}
.
3, ={A, B}
4, ={A, B}
5, ={B, C}
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 22
3. Maximum Cardinality Search (7)
A
B C
F
DG
1, =
2, ={A} 3, ={A, B}
4, ={A, B}
5, ={B, C}
6, ={F}
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 23
3. Maximum Cardinality Search (8)
A
B C
F
DG
1, =
2, ={A} 3, ={A, B}
4, ={A, B}
5, ={B, C}
6, ={F}
Output = “G is chordal”
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 24
4. Cliques of Chordal Graph (1)
Algorithm 4.11 [Finding the Cliques of a Chordal Graph]• From numbering (v1,..., vk) obtained by maximum cardinality s
earch i = cardinality of vi
• Make ladder nodes. i = ladder node if i = k
or i = ladder node if i < k and i+1 < 1 + i
• Define cliques– Cj = {j} j
C1, C2... Posess RIP (running intersection property).
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 25
4. Cliques of Chordal Graph (2)
A
B C
F
DG
1, =
2, ={A} 3, ={A, B}
4, ={A, B}
5, ={B, C}
6, ={F}
C1 = {A, B, C}
C2 = {A, B, D}
C3 = {B, C, F}
C4 = {F, G}
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 26
Running Intersection Property
• RIP : definition– Given (C1, C2, ..., Ck),– For all 1 < j k, there is an i < j such that Cj (C1 ... Cj-1) Ci.
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 27
5. Junction Tree Construction (1)
Algorithm 4.8 [Junction Tree Construction]• From the cliques (C1, ..., Cp) of a chordal graph ordered with
RIP,• Associate a node of the tree with each clique Cj.
• For j = 2, ..., p, add an edge between Cj and Ci where i is any one value in {1, ..., j-1} such that Cj (C1 ... Cj-1) Ci.
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 28
5. Junction Tree Construction (2)
A
B C
F
DG
1, =
2, ={A} 3, ={A, B}
4, ={A, B}
5, ={B, C}
6, ={F}
C1 = {A, B, C}
C2 = {A, B, D}
C3 = {B, C, F}
C4 = {F, G}
ABC
ABD
BCF
FG
C1
C2
C3
C4
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 29
5. Junction Tree Construction (3)
A
B C
F
DG
1, =
2, ={A} 3, ={A, B}
4, ={A, B}
5, ={B, C}
6, ={F}
C1 = {A, B, C}
C2 = {A, B, D}
C3 = {B, C, F}
C4 = {F, G}
ABC
ABD
BCF
FG
C1
C2
C3
C4
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 30
5. Junction Tree Construction (4)
A
B C
F
DG
1, =
2, ={A} 3, ={A, B}
4, ={A, B}
5, ={B, C}
6, ={F}
C1 = {A, B, C}
C2 = {A, B, D}
C3 = {B, C, F}
C4 = {F, G}
ABC
ABD
BCF
FG
C1
C2
C3
C4
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 31
5. Junction Tree Construction (5)
A
B C
F
DG
1, =
2, ={A} 3, ={A, B}
4, ={A, B}
5, ={B, C}
6, ={F}
C1 = {A, B, C}
C2 = {A, B, D}
C3 = {B, C, F}
C4 = {F, G}
ABC
ABD
BCF
FG
C1
C2
C3
C4
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 32
Contents
• From DAG to Junction Tree• From Elimination Tree to Junction From Elimination Tree to Junction
TreeTree• Junction Tree Algorithms• Learning Bayesian Networks
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 33
Triangulation (1)
• When need triangulation?– If MCS (Maximum Cardinality Search)
failed.
• Triangulation– introduces Fill-in.– produces perfect numbering.
• Optimal triangulation: NP-hard– Size of each cliques matters...
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 34
Triangulation (2)
Algorithm 4.13 [One-step Look Ahead Triangulation]• Start with all vertices unnumbered, set counter i := k.• While there are still some unnumbered vertices:
– Select an unnumbered vertex v to optimize the criterion c(v). or– Select v = (i) [ is an order].– Label it with the number i.– Form the set Ci consisting of vi and its unnumbered neighbours.
– Fill in edges where none exist between all pairs of vertices in Ci.
– Eliminate vi and decrement i by 1.
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 35
Triangulation (3)
A
B C
F
DG
= (A,B,C,D,F,G)
6, C6 = {F, G}
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 36
Triangulation (4)
A
B C
F
DG
= (A,B,C,D,F,G)
6, C6 = {F, G}
5, C5 = {B,C,F}
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 37
Triangulation (5)
A
B C
F
DG
= (A,B,C,D,F,G)
6, C6 = {F, G}
5, C5 = {B,C,F}
4, C4 = {A,B,D}
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 38
Triangulation (6)
A
B C
F
DG
= (A,B,C,D,F,G)
6, C6 = {F, G}
5, C5 = {B,C,F}
4, C4 = {A,B,D}
3, C3 = {A,B,C}
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 39
Triangulation (7)
A
B C
F
DG
= (A,B,C,D,F,G)
6, C6 = {F, G}
5, C5 = {B,C,F}
4, C4 = {A,B,D}
3, C3 = {A,B,C}
2, C2 = {A,B}
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 40
Triangulation (8)
A
B C
F
DG
= (A,B,C,D,F,G)
6, C6 = {F, G}
5, C5 = {B,C,F}
4, C4 = {A,B,D}
3, C3 = {A,B,C}
2, C2 = {A,B}
1, C1 = {A} Elimination set• Cj contains vj.
• vj Cl for all l < j.
• (C1,..., Ck) has RIP.• The cliques of the triangulat
ed graph G’ are contained in (C1,..., Ck).
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 41
Elimination Tree Construction (1)
Algorithm 4.14 [Elimination Tree Construction]• Associate a node of the tree with each set Ci.
• For j = 1, ..., k, if Cj contains more than one vertex, add an edge between Cj and Ci where i is the largest index of a vertex in Cj \ {vj}
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 42
Elimination Tree Construction (2)
A:
B:A C:AB
F:BC
D:AB
G:FC6
C5
C4
C3
C2
C1
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 43
Elimination Tree Construction (3)
A:
B:A C:AB
F:BC
D:AB
G:FC6
C5
C4
C3
C2
C1
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 44
Elimination Tree Construction (4)
A:
B:A C:AB
F:BC
D:AB
G:FC6
C5
C4
C3
C2
C1
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 45
Elimination Tree Construction (5)
A:
B:A C:AB
F:BC
D:AB
G:FC6
C5
C4
C3
C2
C1
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 46
Elimination Tree Construction (6)
A:
B:A C:AB
F:BC
D:AB
G:FC6
C5
C4
C3
C2
C1
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 47
Elimination Tree Construction (7)
A:
B:A C:AB
F:BC
D:AB
G:FC6
C5
C4
C3
C2
C1
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 48
From etree to jtree (1)
Lemma 4.16– Let C1,..., Ck be a sequence of sets with RIP
– Assume that Ct Cp for some t p and that p is minimal with this property for fixed t. Then:
(i) If t > p, then C1, ..., Ct-1, Ct+1, ..., Ck has the running intersection property
(ii) If t < p, then C1,..., Ct-1, Cp, Ct+1, ..., Cp-1, Cp+1,..., Ck has the RIP.
Simple removal of redundant elimination set might lead to destroy RIP.
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 49
From etree to jtree (2)
A:
B:A C:AB
F:BC
D:AB
G:FC6
C5
C4
C3
C2
C1
Condition (ii): t = 1, p = 2
B:A C:AB
F:BC
D:AB
G:FC6
C5
C4
C3
C2
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 50
From etree to jtree (3)
Condition (ii): t = 2, p = 3
B:A C:AB
F:BC
D:AB
G:FC6
C5
C4
C3
C2
C:AB
F:BC
D:AB
G:FC6
C5
C4
C3
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 51
MST for making jtree (1)
Algorithm• From Elimination set (C1, ..., Ck)
• Remove redundant Cis• Make junction graph.
– If |Ci Cj | > 0 add edge between Ci and Cj.
– Set weight of the edge as |Ci Cj |.
• Construct MST (Maximum Weight Spanning Tree)
The resulting tree is junction tree. Also the clique set has RIP.
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 52
MST for making jtree (2)
ABC
BCFABD
FG
2 2
1
1
ABC
BCFABD
FG
2 2
1
Junction graph MST
C1
C2
C3
C4
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 53
MST for making jtree (3)
• Optimal jtree (for a fixed elimination ordering)– cost of edge e = (v, w)
– Use cost of edge to break tie when constructing MST. (minimum preferred)
on. can take valuesdiscrete of # :
)(
ii
vi iv
wvwv
Xq
qqqe
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 54
Contents
• From DAG to Junction Tree• From Elimination Tree to Junction Tree• Junction Tree AlgorithmsJunction Tree Algorithms• Learning Bayesian Networks
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 55
Collect phase
jji
jij μ
)(childjkjkj Sμ
Ck
Cj
Ci Ci’
• From leaf to root
separator
projection
Initial potential
Updated potential
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 56
Distribute phase
• From root to leaf j
* contains marginal distribution of clique j.
ji
ijjijjk
iijchildijiij
jkjj
SSμμ
μ
*
'),(''
*
Ck
Cj
Ci Ci’
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 57
Contents
• From DAG to Junction Tree• From Elimination Tree to Junction Tree• Junction Tree Algorithms• Learning Bayesian NetworksLearning Bayesian Networks
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 58
Learning Paradigm
• Known structure or unknown structure• Full observability or partial observability• Frequentist or Bayesian
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 59
Ks, Fo, Fr (1)
• Given training set D = {D1, ..., DM}
• MLE of parameters of each CPD– MLE (Maximum likelihood Estimates)– CPD (Conditional Probability Distribution)
M
m
n
i
M
mmiim DXPaXPGDL
1 1 1
)),(|(log)|Pr(log
Decomposition, for each node# of nodes
# of data
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 60
Ks, Fo, Fr (2)
• Multinomial distributions– , for tabular CPD– Log-likelihood
– MLE
))(|(def
jXPakXP iiijk
ijkijk
ijk
i m kjijkijkm
i m kj
Iijk
N
I
L ijkm
log
log
log
,
,)|)(,(
def
miiijkm DjXPakXII
m
miiijk DjXPakXIN )|)(,(def
' '
ˆ
k ijk
ijkijk N
N constraint: ji
k ijk , allfor 1
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 61
Ks, Fo, Fr (3)
• MLE of Multinomial distr.– Constrained optimization
ij k
ijkijijkijk
ijkNO )1(log
ijijk
ijk
ijk
N
d
dO
ijkijijkN
k
ijkijk
ijkN
ijk
ijkN
''
ˆ
kijk
ijkijk N
N
Derivatives of ijk
Setting Derivatives of ijk zero
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 62
Ks, Fo, Fr (4)
• Conditional linear Gaussian distributions
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 63
Ks, Fo, Ba (1)
• Frequentist: point estimation• Bayesian: distributional estimation
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 64
Ks, Fo, Ba (2)
• Multinomial distributions– Two assumptions on prior
• Global independence:
• Local independence:
– Global independence + likelihood equivalence leads to Dirichlet prior: Conjugate prior for multinomial
},...,1,,...,1,{ ,)(1 iiijki
n
i i rkqjP
},...,1,{ ,)(1 iijkij
q
j iji rkP i
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 65
Ks, Fo, Ba (3)
• Remark on Bayesian– P(|D) P(D| )*P()
– Conjugate priors• Posterior has same form with prior distribution.• Many exponential family belongs to conjugate
priors.
PosteriorLikelihood
Prior
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 66
Ks, Fo, Ba (4)
• Multinomial distributions– Dirichlet prior on tabular CPDs
ij: multinomial r.v. with ri possible values
• Posterior distribution
• Posterior mean
))(|( jXPaXP iiij
),...,(~ 1 iijrijij Dirichlet
i
i
ijk
r
k ijrijijkijij B
P1 1
1
),...,(
1)|(
1
1 ),...,(
k k
k kB
)!1()( nn
),...,(~| 11 ii ijrijrijijij NNDirichletD
ir
l ijlijl
ijkijkijk
N
NDE
1
]|[
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 67
Ks, Fo, Ba (5)
• Dirichlet distribution– Hyper parameter ijk
• Positive number • Pseudo count• # of imaginary cases ijk - 1
– Posterior distribution• Combined count between pseudo count and # of obser
ved data• Simple sum
),...,(~ 1 iijrijij Dirichlet
),...,(~| 11 ii ijrijrijijij NNDirichletD
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 68
Ks, Fo, Ba (6)
• Gaussian distributions
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 69
Ks, Po, Fr (1)
• Log likelihood
• Not decomposable into a sum of local terms, one per node– EM algorithm
m hm
mm
DVhHP
DPL
),(log
)(loghidden
visible (observed)
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 70
Ks, Po, Fr (2)
• EM algorithm– From Jensen’s inequality
1),log()log( j
j jjjj
jj yy
m hmm
m hmm
m h m
mm
m h m
mm
m hm
VhqVhqVhHPVhq
Vhq
VhHPVhq
Vhq
VhHPVhq
VhHPL
)|(log)|(),(log)|(
)|(
),(log)|(
)|(
),()|(log
),(log
1)|( h mVhqconstraint:
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 71
Ks, Po, Fr (3)
– Maximizing w.r.t. q (E-step)
m hmmh
m hmm
m hmhm
Vhq
VhqVhqVHPVhqO
))|(1(
)|(log)|(),(log)|(
mhmmhm
VhqVHPVhdq
dO 1)|(log),(log)|(
mhe
VHPVhq mh
m
1
),()|(
h
mhh
m VHPe
Vhqmh
),(1
)|( 1
)(),(1m
hmh VPVHPe mh
)|()|( mm VhPVhq
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 72
Ks, Po, Fr (4)
– Maximizing w.r.t (M-step)• After q is maximized to p(h|Vm)• Maximizing Expected complete-data log-likelihood
• Iteration until convergence– E-step
• Calculate expected complete-data log-likelihood– M-step
• Get * maximizing expected complete-data log-likelihood
m h
mm VhHPVhpQ )'|,(log),|()|'(
)|'(maxarg*'
Q
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 73
Ks, Po, Fr (5)
• Multinomial distribution– E-step
– M-step
ijk
ijkijkNEQ 'log][)|'( ijkijk
ijkNL log
)|)(,(def
miiijkm DjXPakXII
m
miiijk DjXPakXIN )|)(,(def
mmiiijk DjXPakXPNE ),|)(,(][
)|'(maxarg'
Q
''][
][
kijk
ijkijk NE
NE
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 74
Ks, Po, Ba (1)
• Gibbs sampling: stochastic version of EM• Variational Bayes: P(, H|V) q(|V)q(H|V)
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 75
Us, Fo, Fr (1)
• Issues– Hypothesis space– Evaluation function– Search algorithm
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 76
Us, Fo, Fr (2)
• Search space– DAG
• # of DAGs ~ O(2n^2)• 10 nodes ~ O(1018) DAGs• Finding optimal DAG: doomed to failure
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 77
Us, Fo, Fr (3)
• Search algorithm– Local search
• Operators: adding, deleting, reversing a single arcChoose G somehow
While not convergedFor each G’ in nbd(G)
Compute score(G’)G* := arg maxG’ score(G’)
If score(G*) > score(G)then G :=G*
else converged := true Psedo-code for hill-climbing. nbd(G) is the neighborhood of G, i.e., the
models that can be reached by applying a single local change operator.
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 78
Us, Fo, Fr (4)
• Search algorithm– PC algorithm
• Starts with fully connected undirected graph• CI (conditional independence) test
– If X Y|S, arc between X and Y is removed.
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 79
Us, Fo, Fr (5)
• Scoring function– MLE selects fully connected graph.– score(G) P(D|G)P(G)
– Automatically penalizing effect on complex model.• has more parameters.• Not much probability mass to the space where data act
ually lies.
)(
)()|()|( model MAP
DP
GPGDPDGP
penalizing complex models
)|(),|()|()(score GPGDPGDPG
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 80
Us, Fo, Fr (6)
• Scoring function– Under global independences, and
conjugate priors
– Integration at closed form
n
iii
n
iiiii
XXPa
PXPaXPGDPi
1
def
1
)),((score
)()),(|()|(
Decomposition as factored form
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 81
Us, Fo, Fr (7)
• Scoring function– Under not conjugate priors: approximation– Laplace approximation: BIC (Bayesian Information
Criterioin)
– Case of multinomial distribution
Md
GDPGDP G log2
)ˆ,|(log)|(log
dim. of the model
ML estimate of params.
Md
N
Md
DXPaXPG
i
i jkijkijk
im
i miii
log2
log
log2
),ˆ),(|(log)(scoreBIC
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 82
Us, Fo, Fr (8)
• Scoring function– Advantage of decomposed score– Marginal likelihood at most two different
terms in single link mismatched graphs.• Ex) G1:X1X2 X3 X4, G2:X1 X2X3 X4
),(score),(score),(score)(score
),(score)(score),,(score)(score
)|(
)|(
4332211
4333211
1
2
XXXXXXX
XXXXXXX
GDP
GDP
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 83
Us, Fo, Fr (9)
• Scoring function– Marginal likelihood for the multinomial distributio
n with Dirichlet prior – Bayesian Dirichlet (BD) score
n
i
q
j
r
k
Nijk
i iijkGDPGDP
1 1 1
),|()|(
ii
i
i
ii
r
k ijk
ijkijkn
i
q
j ijij
ij
n
i
q
j ijrij
ijrijrijij
N
N
B
NNBGDP
11 1
1 1 1
11
)(
)(
)(
)(
),...,(
),...,()|(
posterior mean
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 84
Us, Fo, Ba (1)
• Posterior over all models is intractable– Focusing on some features
• Bayesian model averaging
• Needs to calculate P(G|D)
– Solution MCMC: Metropolis-Hastings algorithm• Only need to ratio R. Integration is avoided.
G
GfDGPDfP )()|()|( f(G)=1 if G contains a certain edge
')'()'|(
)()|()|(
GGPGDP
GPGDPDGP
Integration is intractable.
)|(
)|(
)(
)(
)|(
)|(
1
2
1
2
1
2
GDP
GDP
GP
GP
DGP
DGP
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 85
Us, Fo, Ba (2)
• Calculation of P(G|D)– Sampling GChoose G somehowWhile not converged
Pick a G’ u.a.r. from nbd(G)Compute R = P(G’|D)q(G|G’)/P(G|D)q(G’|G)Sample u ~ uniform(0,1)If u < min{1, R}
then G := G’
Psedo-code for MC3 algorithm. u.a.r. means uniformly at random.
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 86
Us, Po, Fr (1)
• Partially observable– Computation of marginal likelihood:
Intractable– Not decomposable to the product of local
terms
– Solutions• Approximating the marginal likelihood• Structural EM
Z
GPGZVPGVP
)|(),|,()|(
hidden variables
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 87
Us, Po, Fr (2)
• Approximating the marginal likelihood– Candidate’s method
),|(
)|(),|()|(
*
**
GDP
GPGDPGDP
G
GG
from Gibbs sampling
from BN’s inference algorithm
trivial
MLE of params.
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 88
Us, Po. Fr (3)
• Structural EM– Idea: decomposition of expected complete-
data log-likelihood (BIC-score)– Search inside EM
• (EM inside Search is high cost process)
Md
NG i
i jkijkijk log
2log)(BICscore
Md
NG i
i jkijkijk log
2ˆlog)(EBICscore
MLE of params.
m
miiijk DjXPakXPN ),|)(,(
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 89
Us, Po, Ba (1)
• Combined MCMC– MCMC for Bayesian model averaging– MCMC over the values of the unobserved
nodes.
Copyright (c) 2002 by SNU CSE Biointelligence Lab. 90
Conclusion
• Has learning of structure important meaning?– In paper, Yes.– In engineering, No.
• What can AI do for human?• What can human do for Machine
learning algorithm?