Jylee probabilistic reasoning with bayesian networks

Probabilistic Reasoning

in Bayesian Networks

KAIST AIPR Lab.

Jung-Yeol Lee

17th June 2010

1

KAIST AIPR Lab.

Contents

• Backgrounds

• Bayesian Network

• Semantics of Bayesian Network

• D-Separation

• Conditional Independence Relations

• Probabilistic Inference in Bayesian Networks

• Summary

2

KAIST AIPR Lab.

Backgrounds

• Bayes’ rule

From the product rule,

Combining evidence e

• Conditional independence

3

)()|()()|()( XPXYPYPYXPYXP

constantion normalizat theis where),()|()(

)()|()|( YPYXP

XP

YPYXPXYP

ZYXZYPZXPZYXP | when )|()|()|,(

)|(

)|(),|(),|(

eXP

eYPeYXPeXYP

KAIST AIPR Lab.

Bayesian Network

• Causal relationships among random variables

• Directed acyclic graph

Node : random variables

Directed links: probabilistic relationships between variables

Acyclic: no links from any node to any lower node

• Link from node X to node Y,

• Conditional probability distribution of

Effect of the parents on the node

4

iX

))(|( ii XParentsXP

)( is YParentX

iX

iX

KAIST AIPR Lab.

A P(M|A)

T 0.70

F 0.01

A P(J|A)

T 0.90

F 0.05

Example of Bayesian Network

• Burglary network

5

MaryCalls

Burglary

JohnCalls

Earthquake

Alarm

B E P(A|B,E)

T T 0.95

T F 0.94

F T 0.29

F F 0.001

Conditional Probability Tables

)|()|( AJPBEAMJP

Directly influenced by Alarm

P(E)

0.002P(B)

0.001

KAIST AIPR Lab.

Semantics of Bayesian Network

• Full joint probability distribution

Notation:

• Constructing Bayesian networks

For every variable

•

Correctness

• Choose parents for each node s.t. this property holds

6

)( from dabbreviate ),,( 111 nnn xXxXPxxP

)(in variables theof valuesspecific theis )( where

)),(|(),,(1

1

ii

n

i

iin

XParentsXparents

XparentsxPxxP

rulechain by ),,|(),,( 11

1

1 xxxPxxP ii

n

i

n

network, in the iX

},,{)( that provided ))(|(),,|( 1111 XXXParentsXParentsXPXXXP iiiiii

KAIST AIPR Lab.

Semantics of Bayesian Network (cont’d)

• Compactness

Locally structured system

• Interacts directly with only a bounded number of components

Complete network specified by conditional probabilities

where at most k parents

• Node ordering

Add “root causes” first

Add variables influenced, and so on

Until reach the “leaves”

• “Leaves”: no direct causal influence on others

7

kn2

KAIST AIPR Lab.

Three example of 3-node graphs

Tail-to-Tail Connection

• Node c is said to be tail-to-tail

• When node c is observed,

Node c blocks the path from a to b

Variables a and b are independent

8

c

ba 0

)()|()|(),(

b|a

cPcbPcaPbaPc

c

ba

)|()|()(

),,()|,(

cb|a

cbPcaPcP

cbaPcbaP

KAIST AIPR Lab.


Head-to-Tail Connection

9

• Node c is said to be head-to-tail

• When node c is observed,



c ba0

)|()()|()|()(),(

b|a

abPaPcbPacPaPbaPc

c ba

)|()|()(

)|()|()(

)(

),,()|,(

cb|a

cbPcaPcP

cbPacPaP

cP

cbaPcbaP

KAIST AIPR Lab.


Head-to-Head Connection

10

• Node c is said to be head-to-head

• When node c is unobserved,



c

ba

0

)()(),|()()(),,(),,(

),|()()(),,(

b|a

bPaPbacPbPaPbaPcbaP

bacPbPaPcbaP

c c

)(

),|()()(

)(

),,()|,(

cb|a

cP

bacPbPaP

cP

cbaPcbaP

c

ba

KAIST AIPR Lab.

D-separation

• Let A, B, and C be arbitrary nonintersecting sets of nodes

• Paths from A to B is blocked if it includes either,

Head-to-tail or tail-to-tail node, and node is in C

Head-to-head node, and node and its descendants is not in C

• A is d-separated from B by C if,

Any node in possible paths from A to B blocks the path

11

a

e

c

f

b

a

e

c

f

b

cba | fba |

KAIST AIPR Lab.

Conditional Independence Relations

• Conditionally independent of

its non-descendants, given its

parents

• Conditionally independent of

all other nodes, given its

Markov blanket*

• In general, d-separation is used for

deciding independence

12

X

U1 Um

Z1j

Y1

Znj

Yn

X

U1 Um

Z1j

Y1

Znj

Yn

* Parents, children, and children’s other parents

KAIST AIPR Lab.

Probabilistic Inference In Bayesian Networks

• Notation

X: the query variable

E: the set of evidence variables, E1,…,Em

e: particular observed evidences

• Compute posterior probability distribution

• Exact inference

Inference by enumeration

Variable elimination algorithm

• Approximate inference

Direct sampling methods

Markov chain Monte Carlo (MCMC) algorithm

13

)|( eXP

KAIST AIPR Lab.

Exact Inference In Bayesian Networks

Inference By Enumeration

•

• Recall,

• Computing sums of products of conditional probabilities

• In Burglary example,

• O(2n) time complexity for n Boolean variables

14

iablehidden var isy where),,(),()|( y

yeXPeXPeXP

))(|(),,(1

1

n

i

iin XparentsxPxxP

e a

e a

e a

amPajPebaPePbP

amPajPebaPePbPmjbP

mjaeBPmjBPmjBP

)|()|(),|()()(

)|()|(),|()()(),|(

),,,,(),,(),|(

M

B

J

E

A

KAIST AIPR Lab.


Variable Elimination Algorithm

• Eliminating repeated calculations of Enumeration

15

Repeated calculations

e a

amPajPeBaPEPBPmjBP )|()|(),|()()(),|(

KAIST AIPR Lab.


Variable Elimination Algorithm (cont’d)

• Evaluating in right-to-left order (bottom-up)

• Each part of the expression makes factor

• Pointwise product

16

e a

amPajPeBaPEPBPmjBP )|()|(),|()()(),|(

ajP

ajPA

amP

amPA JM

|(

)|()(f,

|(

)|()(f

)(f)(f),|(

),(f)(f)(f

)(f)(f),,(f),(f

)|()|(

)|()|()(f

BBmjBP

eBeB

aaEBaEB

amPajP

amPajPA

JMAEB

eJMAEJMAE

a

MJAJMA

JM

M

B

J

E

A

KAIST AIPR Lab.


Variable Elimination Algorithm (cont’d)

• Repeat removing any leaf node that is not a query variable or

an evidence variable

• In Burglary example,

• Time and space complexity

Dominated by the size of the largest factor

In the worst case, exponential time and space complexity

17

)|( trueBJP

M

B

J

E

A

ae

a me

aJPebaPePbP

amPaJPebaPePbPbJP

)|(),|()()(

)|()|(),|()()()|(

KAIST AIPR Lab.

Approximate Inference In Bayesian Networks

Direct Sampling Methods

• Generating of samples from known probability distribution

• Sample each variable in topological order

•

• : the probability of specific event from Prior-Sample

18

Function Prior-Sample(bn) returns an event sampled from the prior specified by bn

inputs: bn, a Bayesian network specifying joint distribution P(X1,…,Xn)

x ← an event with n elementsfor i=1 to n do

xi ← a random sample from P(Xi | parents(Xi))

return x

),...,( 1 nPS xxS

nn

nnPSnPS

N

n

n

i

iinPS

xx),...,xN(x

xxPxxSN

xxN

xxPXparentsxPxxS

,,event theoffrequency theis e wher

),,(),...,(),...,(

lim

),,())(|(),...,(

11

111

1

1

1

(Consistent estimate)

KAIST AIPR Lab.


Rejection Sampling Methods

• Rejecting samples that is inconsistent with evidence

• Estimate by counting how often occurs

• Rejects samples exponentially as the number of evidence

variables grows

19

)|()(

),(

)(

),(),()|(ˆ

eXPeP

eXP

eN

eXNeXNeXP

PS

PSPS

xX


KAIST AIPR Lab.


Likelihood weighting

• Generating only consistent events w.r.t. the evidence

Fixes the values for the evidence variables E

Samples only the remaining variables X and Y

•

20

function Likelihood-Weighting(X, e, bn, N) returns an estimate of P(X|e)

local variables: W, a vector of weighted counts over X, initially zerofor i=1 to N do

x, w ← Weighted-Sample(bn, e)

W[x] ← W[x]+w where x is the value of X in x

Return Normalize(W[X])

function Weighted-Sample(bn, e) returns an event and a weightx ← an event with n elements; w ← 1

for i=1 to n doif Xi has a value xi in e

then w ← w

else xi ← a random sample fromreturn x, w

))(|( iii XparentsxXP

))(|( ii XparentsXP

KAIST AIPR Lab.


Likelihood weighting (cont’d)

• Sampling distribution SWS by Weighted-Sample

• The likelihood weight w(z,e)

• Weighted probability of a sample

21

l

i

iiWS ZparentszPezS1

Y{X} Z where))(|(),(

m

i

ii EparentsePezw1

))(|(),(

),(

)(|())(|(),(),(11

ezP

EparentsePZparentszPezwezSm

i

ii

l

i

iiWS

KAIST AIPR Lab.


Markov Chain Monte Carlo Algorithm

• Generating event by random change to one of nonevidence

variables Zi

• Zi conditioned on current values in the Markov blanket of Zi

• State specifying a value for every variables

• Long-run fraction of time spent in each state

•

22

function MCMC-Ask(X, e, bn, N) returns an estimate of P(X|e)

local variables: N[X], a vector of counts over X, initially zeroZ, the nonevidence variables in bn

x, the current state of the network, initially copied from einitialize x with random values for the variables in Zfor j=1 to N do

for each Zi in Z dosample the value of Zi in x from given the values of in xN[x]←N[x] + 1 where x is the value of X in x

return Normalize(N[X])

))(|( ii ZmbZP )( iZmb

)|( eXP

KAIST AIPR Lab.


Markov Chain Monte Carlo Algorithm (cont’d)

• Markov chain on the state space

• Consistency

Markov chain reached its stationary distribution if it has detailed

balance

23

x state x tostate fromn transitioofy probabilit the:)( xxq

sampler Gibbs called ),,|()),(),(()( exxPxxxxqxxq iiiiii

ii XX other than iableshidden var theall be Let

KAIST AIPR Lab.

Summary

• Bayesian network

Directed acyclic graph expressing causal relationship

• Conditional independence

D-separation property

• Inference in Bayesian network

Enumeration: intractable

Variable elimination: efficient, but sensitive to topology

Direct sampling: estimate posterior probabilities

MCMC algorithm: powerful method for computing with

probability models

24

KAIST AIPR Lab.

References

[1] Stuart Russell et al., “Probabilistic Reasoning”, Artificial

Intelligence A Modern Approach, Chapter 14, pp.492-519

[2] Eugene Charniak, "Bayesian Networks without Tears", 1991

[3] C. Bishop, “Graphical Models”, Pattern Recognition and

Machine Learning, Chapter 8, pp.359-418

25

KAIST AIPR Lab.

Q&A

26

• Thank you

KAIST AIPR Lab.

Appendix 1. Example of Bad Node Ordering

27

MaryCallsJohnCalls

Alarm

Burglary Earthquake

① ②

③

④ ⑤

• Two more links and unnatural probability judgments

KAIST AIPR Lab.

Appendix 2. Consistency of Likelihood Weighting

•

28

)|(

),('

),,('

),,(),,('

),,(),,()|(ˆ

e

e

ey

eyey

eyeye

y

y

y

xP

xP

xP

xwxS

xwxNxP

WS

WS

from Likelihood-Weighting

for large N


KAIST AIPR Lab.

Appendix 2. State Distribution of MCMC

• Detailed balance

• Gibbs sampler,

• Stationary distribution if

29

xx, xxqxxxqx

tx(x)πt

allfor )()()()(

at time statein being system ofy probabilit thebe Let

1 tt

)()()(

)()()()()(1

xxxqx

xxqxxxqxx

x

xx

t

),|()),(),(()( exxPxxxxqxxq iiiiii

)x(x)xq(

rulechain backwardsby )|,(),|(

)|,(on rulechain by ),|()|(),|(

),|()|,(),|()|()()(

exxPexxP

exxPexxPexPexxP

exxPexxPexxPexPxxqx

iiii

iiiiiii

iiiiii

Jylee probabilistic reasoning with bayesian networks

Technology