. Exact Inference in Bayesian Networks Lecture 9
Dec 14, 2015
.
Exact Inference in Bayesian NetworksLecture 9
2
Queries
There are many types of queries.
Most queries involve evidence
An evidence e is an assignment of values to a set E of variables in the domain
P(Dyspnea = Yes | Visit_to_Asia = Yes, Smoking=Yes)
P(Smoking=Yes | Dyspnea = Yes ) V S
LT
A B
X D
3
Computing A posteriori Belief in Bayesian Networks
Set the evidence in all local probability tables that are defined over some variables from E.
Iteratively
Move all irrelevant terms outside of innermost sum
Perform innermost sum, getting a new term
Insert the new term into the product
2 1
)|(),( 1x x x i
ii
m m
paxPxP e
Input: A Bayesian network, a set of nodes E with evidence E=e, an ordering x1,…,xm of all variables not in E.Output: P(x1,e) for every value x1 of X1. {from which p(x1|e) is available}The query:
4
Belief Update IV S
LT
A G
X D
Suppose get evidence V = vo, S = so, D = do
We wish to compute P(l,vo,so,do) for every value l of L.P(l, vo,so,do ) = t,b,x,a P(vo,so,l,t,g,a,x,do) =
p(v0) p(s0) p(l|s0) t p(t|v0) g p(g|s0) a p(a|t,l)p(d0|a,g) x p(x|
a)p(v0) p(s0) p(l|s0) t p(t|v0) g p(g|s0) a p(a|t,l)p(d0|a,g) bx
(a)p(v0) p(s0) p(l|s0) t p(t|v0) g p(g|s0) ba(t,l,g)
p(v0) p(s0) p(l|s0) t p(t|v0)
bg(t,l)p(v0) p(s0) p(l|s0)
bt(l)To obtain the posterior belief in L given the evidence we normalize the result to 1.
5
Belief Update IIT
A
X D
Suppose we get evidence D = do
We wish to compute P(l,do) for every value l of L.
Good summation order (variable A is summed last):
P(l, do) = a,t,x P(a,t,x,l,do) = a p(a) p(l|a) p(do|a) t p(t|a) x p(x|
a)
L
Bad summation order (variable A is summed first):
P(l, do) = a,t,x P(a,t,x,l,do) = x t a p(a) p(l|a) p(do|a) p(t|a) p(x|
a)Yields a three dimensional temporary table
How to choose a reasonable order ?
6
The algorithm to compute P(x1,e) Initialization
Set the evidence in all (local probability) tables that are defined over some variables from E. Set an order to all variables not in E.
Partition all tables into buckets such that bucketi contains all tables whose highest indexed variable is Xi.
For p=m downto 1 do
Suppose 1,…, j are the tables in bucketp being processed and suppose S1,…Sj are the respective set of variables in these tables.
Up the union of S1,…,Sj with Xp excluded
max the largest indexed variable in Up
For every assignment Up=u compute:
Add p (u) into bucketmax
),()(1 ip
Spx
j
i ip uxu
)( 11x
j
i i
iSu{Def: is the value of u projected on Si.}
Return the vector
7
The computational task at hand
kx x x
n
iii paxPP
3 1 1
)|()|( data
lmnkjmk
ikllmn
ij CBAY
Multidimensional multiplication/summation:
kjk
ikij BAC Example: Matrix multiplication:
5011505050 xxx CBA versus 5011505050 xxx CBA
8
Complexity of variable elimination
Space and time Complexity is at least exponential in number of variables in the largest intermediate factor.
Space and time complexity can be as large as the sum of sizes of the intermediate factors taking into account the number of values of each variable.
9
A Graph-Theoretic ViewA Graph-Theoretic View
NG(v) is the set of vertices that are adjacent to v in G.
Eliminating vertex v from a (weighted) undirected graph G – the process of making NG(v) a clique and removing v and its incident edges from G.
10
Example
Weights of vertices (#of states):
yellow nodes: w = 2 blue nodes: w = 4
Original Bayes
network.
V S
T L
A B
DX
Undirected graph
representation.
V S
T L
A B
DX
11
Elimination Sequence Elimination sequence of G – an order of the vertices of
G, written as Xα= (Xα(1) ,…,Xα(n) ), where α is a permutation on {1,…,n}.
•The residual graph Gi is the graph obtained from Gi-1 by
eliminating vertex Xα(i-1). (G1≡G).
•The cost of an elimination sequence Xα is the sum of
costs of eliminating Xα(i) from Gi, for all i.)()( )(1
i
n
iG XCXC
i
•The cost of eliminating vertex v from a graph Gi is the
product of weights of the vertices in NGi(v).
12
ExampleExampleSuppose the elimination sequence is Xα=(V,B,S,
…):
G1 VV S
T L
A B
DX
G2 S
T L
A BB
DX
.84*2)(1
VCG
G3 SS
T L
A
DX
.322*2*2*4)(2
BCG
...4328...)()()()(321
SCBCVCXC GGG
13
• Optimal elimination sequence: one with minimal cost.
• Optimal elimination sequence: one with minimal cost.
14
Several Greedy Algorithms
1. In each step a variable with minimal elimination cost is selected.
2. In each step a variable is selected that adds the smallest number of edges.
3. In each step a variable is selected that adds the edges whose sum of weights is minimal.
Since these algorithms are very fast compared to the actual likelihood computation, all options can be tried and the best order among the three be selected.
15
Stochastic Greedy Algorithm
Iteration i: Three (say) variables with (say) minimal elimination cost are found and a coin is flipped to choose between them.
Repeat many times (say, 100) unless the cost becomes low.
The coin could be weighted according to the elimination costs of the vertices or a function of these costs. E.g p1= log2(cost1)/{log2(cost1) + log2(cost2) }
p2= 1-p1
16
CASE STUDY: Genetic Linkage Analysis via Bayesian Networks
We speculate a locus with alleles H (Healthy) / D (affected)
If the expected number of recombinants is low (close to zero), then the speculated locus and the marker are tentatively physically closed.
2
4
5
1
3
H
A1/A1
D
A2/A2
H
A1/A2
D
A1/A2
H
A2/A2
D DA1 A2
H DA1 A2
H | DA2 | A2
D DA2 A2
Recombinant
Phase inferred
17
The Variables InvolvedLijm = Maternal allele at locus i of person j. The values of this variables are the possible alleles li at locus i.
Xij = Unordered allele pair at locus i of person j. The values are pairs of ith-locus alleles (li,l’i). “The genotype” Yj = person I is affected/not affected. “The phenotype”.
Lijf = Paternal allele at locus i of person j. The values of this variables are the possible alleles li at locus i (Same as for Lijm) .
Sijm = a binary variable {0,1} that determines which maternal allele is received from the mother. Similarly, Sijf = a binary variable {0,1} that determines which paternal allele is received from the father.
It remains to specify the joint distribution that governs these variables. Bayesian networks turn to be a perfect choice.
18
The Bayesian network for Linkage
Locus 1
Locus 3 Locus 4
Si3
m
Li1
fL
i1m
Li3
m
Xi1
Si3
f
Li2
fL
i2m
Li3
f
Xi2
Xi3
Locus 2 (Disease)
Y3
y2
Y1
This network depicts the qualitative relations between the variables.We have already specified the local conditional probability tables.
19
Details regarding recombination
S23m
L21fL21m
L23m
X21 S23f
L22fL22m
L23f
X22
X23
S13m
L11fL11m
L13m
X11 S13f
L12fL12m
L13f
X12
X13
{m,f}tssP tt
where
1
1),|( 1323
is the recombination fraction between loci 2 & 1.
Y2Y1
Y3
20
Details regarding the Loci
The phenotype variables Yj are 0 or 1 (e.g, affected or not affected) are connected to the Xij variables (only in the disease locus). For example, model of perfect recessive disease yields the penetrance probabilities:
P(y11 = sick | X11= (a,a)) = 1P(y11 = sick | X11= (A,a)) = 0P(y11 = sick | X11= (A,A)) = 0
Li1fLi1m
Li3m
Xi1
Si3m
Y1
P(L11m=a) is the frequency of allele a. X11 is an unordered allele
pair at locus 1 of person 1 = “the data”. P(x11 | l11m, l11f) = 0 or 1 depending on consistency
21
SUPERLINK
Stage 1: each pedigree is translated into a Bayesian network.
Stage 2: value elimination is performed on each
pedigree (i.e., some of the impossible values of the variables of the network are eliminated).
Stage 3: an elimination order for the variables is determined, according to some heuristic.
Stage 4: the likelihood of the pedigrees given the values is calculated using variable elimination according to the elimination order determined in stage 3.
Allele recoding and special matrix multiplication is used.
22
Comparing to the HMM model
X1 X2 X3 Xi-1 Xi Xi+1X1 X2 X3 Yi-1 Xi Xi+1
X1 X2 X3 Xi-1 Xi Xi+1S1 S2 S3 Si-1 Si Si+1
The compounded variable Si = (Si,1,m,…,Si,2n,f) is called the inheritance vector. It has 22n states where n is the number of persons that have parents in the pedigree (non-founders). The compounded variable Xi = (Xi,1,m,…,Xi,2n,f) is the data regarding locus i. Similarly for the disease locus we use Yi.
REMARK: The HMM approach is equivalent to the Bayesian network approach provided we sum variables locus-after-locus say from left to right.
23
Experiment A (V1.0)
• Same topology (57 people, no loops)• Increasing number of loci (each one with 4-5 alleles)• Run time is in seconds.
Files No. of Run Time Run Time Run Time Run TimeLoci Superlink Fastlink Vitesse Genehunter
A0 2 0.03 0.12 0.27A1 5 0.1 3.77 0.31A2 6 0.14 79.32 0.39A3 7 0.42 0.69A4 8 0.36 2.81A5 10 1.19 84.66A6 12 4.65A7 14 3.01A8 18 20.98A9 37 8510.15
A10 38 10446.27A11 40
over 100 hours
Out-of-memory
Pedigree sizeToo big forGenehunter.
Elimination Order: General Person-by-Person Locus-by-Locus (HMM)
24
Experiment C (V1.0)
Files No. of Run Time Run Time Run Time Run TimeLoci Superlink Fastlink Vitesse Genehunter
Bayesnets Trees Trees HMMD0 100 0.16 (2 l.e.) 0.41 (99 l.e.)D1 110 0.2 (2 l.e.) 0.45 (109 l.e.)D2 120 0.21 (2 l.e.) 0.48 (119 l.e.)D3 130 0.22 (2 l.e.) 0.49 (129 l.e.)D4 140 0.24 (2 l.e.) 0.51 (139 l.e.)D5 150 0.25 (2 l.e.) 0.53 (149 l.e.)D6 160 0.27 (2 l.e.) 0.54 (159 l.e.)D7 170 0.3 (2 l.e.) 0.6 (169 l.e.)D8 180 0.3 (2 l.e.) 0.59 (179 l.e.)D9 190 0.32 (2 l.e.) 0.61 (189 l.e.)D10 200 0.34 (2 l.e.) 0.66 (199 l.e)D11 210 0.37 (2 l.e.) 0.67 (209 l.e)
• Same topology (5 people, no loops)• Increasing number of loci (each one with 3-6 alleles)• Run time is in seconds.
Out-of-memory
Bus error
Order typeSoftware
25
Some options for improving efficiency
1. Multiplying special probability tables efficiently.
2. Grouping alleles together and removing inconsistent alleles.
3. Optimizing the elimination order of variables in a Bayesian network.
4. Performing approximate calculations of the likelihood.
kx x x
n
iii paxPP
3 1 1
)|()|( data
26
Standard usage of linkageThere are usually 5-15 markers. 20-30% of the persons in large pedigrees are genotyped (namely, their xij is measured). For each genotyped person about 90% of the loci are measured correctly. Recombination fraction between every two loci is known from previous studies (available genetic maps).
The user adds a locus called the “disease locus” and places it between two markers i and i+1. The recombination fraction ’ between the disease locus and marker i and ” between the disease locus and marker i+1 are the unknown parameters being estimated using the likelihood function.
This computation is done for every gap between the given markers on the map. The MLE hints on the whereabouts of a single gene causing the disease (if a single one exists).
27
)()( )(max iGi
XCXCi
The unconstrained Elimination Problem reduces to finding treewidth if:• the weight of each vertex is constant, • the cost function is
• Finding the treewidth of a graph is known to be NP-complete (Arnborg et al., 1987).
• When no edges are added, the elimination sequence is perfect and the graph is chordal.
Relation to Treewidth