. Exact Inference in Bayesian Networks Lecture 9.

.

Exact Inference in Bayesian NetworksLecture 9

2

Queries

There are many types of queries.

Most queries involve evidence

An evidence e is an assignment of values to a set E of variables in the domain

P(Dyspnea = Yes | Visit_to_Asia = Yes, Smoking=Yes)

P(Smoking=Yes | Dyspnea = Yes ) V S

LT

A B

X D

3

Computing A posteriori Belief in Bayesian Networks

Set the evidence in all local probability tables that are defined over some variables from E.

Iteratively

Move all irrelevant terms outside of innermost sum

Perform innermost sum, getting a new term

Insert the new term into the product

2 1

)|(),( 1x x x i

ii

m m

paxPxP e

Input: A Bayesian network, a set of nodes E with evidence E=e, an ordering x1,…,xm of all variables not in E.Output: P(x1,e) for every value x1 of X1. {from which p(x1|e) is available}The query:

4

Belief Update IV S

LT

A G

X D

Suppose get evidence V = vo, S = so, D = do

We wish to compute P(l,vo,so,do) for every value l of L.P(l, vo,so,do ) = t,b,x,a P(vo,so,l,t,g,a,x,do) =

p(v0) p(s0) p(l|s0) t p(t|v0) g p(g|s0) a p(a|t,l)p(d0|a,g) x p(x|

a)p(v0) p(s0) p(l|s0) t p(t|v0) g p(g|s0) a p(a|t,l)p(d0|a,g) bx

(a)p(v0) p(s0) p(l|s0) t p(t|v0) g p(g|s0) ba(t,l,g)

p(v0) p(s0) p(l|s0) t p(t|v0)

bg(t,l)p(v0) p(s0) p(l|s0)

bt(l)To obtain the posterior belief in L given the evidence we normalize the result to 1.

5

Belief Update IIT

A

X D

Suppose we get evidence D = do

We wish to compute P(l,do) for every value l of L.

Good summation order (variable A is summed last):

P(l, do) = a,t,x P(a,t,x,l,do) = a p(a) p(l|a) p(do|a) t p(t|a) x p(x|

a)

L

Bad summation order (variable A is summed first):

P(l, do) = a,t,x P(a,t,x,l,do) = x t a p(a) p(l|a) p(do|a) p(t|a) p(x|

a)Yields a three dimensional temporary table

How to choose a reasonable order ?

6

The algorithm to compute P(x1,e) Initialization

Set the evidence in all (local probability) tables that are defined over some variables from E. Set an order to all variables not in E.

Partition all tables into buckets such that bucketi contains all tables whose highest indexed variable is Xi.

For p=m downto 1 do

Suppose 1,…, j are the tables in bucketp being processed and suppose S1,…Sj are the respective set of variables in these tables.

Up the union of S1,…,Sj with Xp excluded

max the largest indexed variable in Up

For every assignment Up=u compute:

Add p (u) into bucketmax

),()(1 ip

Spx

j

i ip uxu

)( 11x

j

i i

iSu{Def: is the value of u projected on Si.}

Return the vector

7

The computational task at hand

kx x x

n

iii paxPP

3 1 1

)|()|( data

lmnkjmk

ikllmn

ij CBAY

Multidimensional multiplication/summation:

kjk

ikij BAC Example: Matrix multiplication:

5011505050 xxx CBA versus 5011505050 xxx CBA

8

Complexity of variable elimination

Space and time Complexity is at least exponential in number of variables in the largest intermediate factor.

Space and time complexity can be as large as the sum of sizes of the intermediate factors taking into account the number of values of each variable.

9

A Graph-Theoretic ViewA Graph-Theoretic View

NG(v) is the set of vertices that are adjacent to v in G.

Eliminating vertex v from a (weighted) undirected graph G – the process of making NG(v) a clique and removing v and its incident edges from G.

10

Example

Weights of vertices (#of states):

yellow nodes: w = 2 blue nodes: w = 4

Original Bayes

network.

V S

T L

A B

DX

Undirected graph

representation.

V S

T L

A B

DX

11

Elimination Sequence Elimination sequence of G – an order of the vertices of

G, written as Xα= (Xα(1) ,…,Xα(n) ), where α is a permutation on {1,…,n}.

•The residual graph Gi is the graph obtained from Gi-1 by

eliminating vertex Xα(i-1). (G1≡G).

•The cost of an elimination sequence Xα is the sum of

costs of eliminating Xα(i) from Gi, for all i.)()( )(1

i

n

iG XCXC

i

•The cost of eliminating vertex v from a graph Gi is the

product of weights of the vertices in NGi(v).

12

ExampleExampleSuppose the elimination sequence is Xα=(V,B,S,

…):

G1 VV S

T L

A B

DX

G2 S

T L

A BB

DX

.84*2)(1

VCG

G3 SS

T L

A

DX

.322*2*2*4)(2

BCG

...4328...)()()()(321

SCBCVCXC GGG

13

• Optimal elimination sequence: one with minimal cost.

• Optimal elimination sequence: one with minimal cost.

14

Several Greedy Algorithms

1. In each step a variable with minimal elimination cost is selected.

2. In each step a variable is selected that adds the smallest number of edges.

3. In each step a variable is selected that adds the edges whose sum of weights is minimal.

Since these algorithms are very fast compared to the actual likelihood computation, all options can be tried and the best order among the three be selected.

15

Stochastic Greedy Algorithm

Iteration i: Three (say) variables with (say) minimal elimination cost are found and a coin is flipped to choose between them.

Repeat many times (say, 100) unless the cost becomes low.

The coin could be weighted according to the elimination costs of the vertices or a function of these costs. E.g p1= log2(cost1)/{log2(cost1) + log2(cost2) }

p2= 1-p1

16

CASE STUDY: Genetic Linkage Analysis via Bayesian Networks

We speculate a locus with alleles H (Healthy) / D (affected)

If the expected number of recombinants is low (close to zero), then the speculated locus and the marker are tentatively physically closed.

2

4

5

1

3

H

A1/A1

D

A2/A2

H

A1/A2

D

A1/A2

H

A2/A2

D DA1 A2

H DA1 A2

H | DA2 | A2

D DA2 A2

Recombinant

Phase inferred

17

The Variables InvolvedLijm = Maternal allele at locus i of person j. The values of this variables are the possible alleles li at locus i.

Xij = Unordered allele pair at locus i of person j. The values are pairs of ith-locus alleles (li,l’i). “The genotype” Yj = person I is affected/not affected. “The phenotype”.

Lijf = Paternal allele at locus i of person j. The values of this variables are the possible alleles li at locus i (Same as for Lijm) .

Sijm = a binary variable {0,1} that determines which maternal allele is received from the mother. Similarly, Sijf = a binary variable {0,1} that determines which paternal allele is received from the father.

It remains to specify the joint distribution that governs these variables. Bayesian networks turn to be a perfect choice.

18

The Bayesian network for Linkage

Locus 1

Locus 3 Locus 4

Si3

m

Li1

fL

i1m

Li3

m

Xi1

Si3

f

Li2

fL

i2m

Li3

f

Xi2

Xi3

Locus 2 (Disease)

Y3

y2

Y1

This network depicts the qualitative relations between the variables.We have already specified the local conditional probability tables.

19

Details regarding recombination

S23m

L21fL21m

L23m

X21 S23f

L22fL22m

L23f

X22

X23

S13m

L11fL11m

L13m

X11 S13f

L12fL12m

L13f

X12

X13

{m,f}tssP tt

where

1

1),|( 1323

is the recombination fraction between loci 2 & 1.

Y2Y1

Y3

20

Details regarding the Loci

The phenotype variables Yj are 0 or 1 (e.g, affected or not affected) are connected to the Xij variables (only in the disease locus). For example, model of perfect recessive disease yields the penetrance probabilities:

P(y11 = sick | X11= (a,a)) = 1P(y11 = sick | X11= (A,a)) = 0P(y11 = sick | X11= (A,A)) = 0

Li1fLi1m

Li3m

Xi1

Si3m

Y1

P(L11m=a) is the frequency of allele a. X11 is an unordered allele

pair at locus 1 of person 1 = “the data”. P(x11 | l11m, l11f) = 0 or 1 depending on consistency

21

SUPERLINK

Stage 1: each pedigree is translated into a Bayesian network.

Stage 2: value elimination is performed on each

pedigree (i.e., some of the impossible values of the variables of the network are eliminated).

Stage 3: an elimination order for the variables is determined, according to some heuristic.

Stage 4: the likelihood of the pedigrees given the values is calculated using variable elimination according to the elimination order determined in stage 3.

Allele recoding and special matrix multiplication is used.

22

Comparing to the HMM model

X1 X2 X3 Xi-1 Xi Xi+1X1 X2 X3 Yi-1 Xi Xi+1

X1 X2 X3 Xi-1 Xi Xi+1S1 S2 S3 Si-1 Si Si+1

The compounded variable Si = (Si,1,m,…,Si,2n,f) is called the inheritance vector. It has 22n states where n is the number of persons that have parents in the pedigree (non-founders). The compounded variable Xi = (Xi,1,m,…,Xi,2n,f) is the data regarding locus i. Similarly for the disease locus we use Yi.

REMARK: The HMM approach is equivalent to the Bayesian network approach provided we sum variables locus-after-locus say from left to right.

23

Experiment A (V1.0)

• Same topology (57 people, no loops)• Increasing number of loci (each one with 4-5 alleles)• Run time is in seconds.

Files No. of Run Time Run Time Run Time Run TimeLoci Superlink Fastlink Vitesse Genehunter

A0 2 0.03 0.12 0.27A1 5 0.1 3.77 0.31A2 6 0.14 79.32 0.39A3 7 0.42 0.69A4 8 0.36 2.81A5 10 1.19 84.66A6 12 4.65A7 14 3.01A8 18 20.98A9 37 8510.15

A10 38 10446.27A11 40

over 100 hours

Out-of-memory

Pedigree sizeToo big forGenehunter.

Elimination Order: General Person-by-Person Locus-by-Locus (HMM)

24

Experiment C (V1.0)

Files No. of Run Time Run Time Run Time Run TimeLoci Superlink Fastlink Vitesse Genehunter

Bayesnets Trees Trees HMMD0 100 0.16 (2 l.e.) 0.41 (99 l.e.)D1 110 0.2 (2 l.e.) 0.45 (109 l.e.)D2 120 0.21 (2 l.e.) 0.48 (119 l.e.)D3 130 0.22 (2 l.e.) 0.49 (129 l.e.)D4 140 0.24 (2 l.e.) 0.51 (139 l.e.)D5 150 0.25 (2 l.e.) 0.53 (149 l.e.)D6 160 0.27 (2 l.e.) 0.54 (159 l.e.)D7 170 0.3 (2 l.e.) 0.6 (169 l.e.)D8 180 0.3 (2 l.e.) 0.59 (179 l.e.)D9 190 0.32 (2 l.e.) 0.61 (189 l.e.)D10 200 0.34 (2 l.e.) 0.66 (199 l.e)D11 210 0.37 (2 l.e.) 0.67 (209 l.e)

• Same topology (5 people, no loops)• Increasing number of loci (each one with 3-6 alleles)• Run time is in seconds.

Out-of-memory

Bus error

Order typeSoftware

25

Some options for improving efficiency

1. Multiplying special probability tables efficiently.

2. Grouping alleles together and removing inconsistent alleles.

3. Optimizing the elimination order of variables in a Bayesian network.

4. Performing approximate calculations of the likelihood.

kx x x

n

iii paxPP

3 1 1

)|()|( data

26

Standard usage of linkageThere are usually 5-15 markers. 20-30% of the persons in large pedigrees are genotyped (namely, their xij is measured). For each genotyped person about 90% of the loci are measured correctly. Recombination fraction between every two loci is known from previous studies (available genetic maps).

The user adds a locus called the “disease locus” and places it between two markers i and i+1. The recombination fraction ’ between the disease locus and marker i and ” between the disease locus and marker i+1 are the unknown parameters being estimated using the likelihood function.

This computation is done for every gap between the given markers on the map. The MLE hints on the whereabouts of a single gene causing the disease (if a single one exists).

27

)()( )(max iGi

XCXCi

The unconstrained Elimination Problem reduces to finding treewidth if:• the weight of each vertex is constant, • the cost function is

• Finding the treewidth of a graph is known to be NP-complete (Arnborg et al., 1987).

• When no edges are added, the elimination sequence is perfect and the graph is chordal.

Relation to Treewidth

. Exact Inference in Bayesian Networks Lecture 9.

Documents

evidence d

t pta x pxa

evidence v

g x pxa pv

ordering x

e u initialization u

evidence e

value of u