Probabilistic Reasoning in Bayesian Networks KAIST AIPR Lab. Jung-Yeol Lee 17 th June 2010 1
Probabilistic Reasoning
in Bayesian Networks
KAIST AIPR Lab.
Jung-Yeol Lee
17th June 2010
1
KAIST AIPR Lab.
Contents
• Backgrounds
• Bayesian Network
• Semantics of Bayesian Network
• D-Separation
• Conditional Independence Relations
• Probabilistic Inference in Bayesian Networks
• Summary
2
KAIST AIPR Lab.
Backgrounds
• Bayes’ rule
From the product rule,
Combining evidence e
• Conditional independence
3
)()|()()|()( XPXYPYPYXPYXP
constantion normalizat theis where),()|()(
)()|()|( YPYXP
XP
YPYXPXYP
ZYXZYPZXPZYXP | when )|()|()|,(
)|(
)|(),|(),|(
eXP
eYPeYXPeXYP
KAIST AIPR Lab.
Bayesian Network
• Causal relationships among random variables
• Directed acyclic graph
Node : random variables
Directed links: probabilistic relationships between variables
Acyclic: no links from any node to any lower node
• Link from node X to node Y,
• Conditional probability distribution of
Effect of the parents on the node
4
iX
))(|( ii XParentsXP
)( is YParentX
iX
iX
KAIST AIPR Lab.
A P(M|A)
T 0.70
F 0.01
A P(J|A)
T 0.90
F 0.05
Example of Bayesian Network
• Burglary network
5
MaryCalls
Burglary
JohnCalls
Earthquake
Alarm
B E P(A|B,E)
T T 0.95
T F 0.94
F T 0.29
F F 0.001
Conditional Probability Tables
)|()|( AJPBEAMJP
Directly influenced by Alarm
P(E)
0.002P(B)
0.001
KAIST AIPR Lab.
Semantics of Bayesian Network
• Full joint probability distribution
Notation:
• Constructing Bayesian networks
For every variable
•
Correctness
• Choose parents for each node s.t. this property holds
6
)( from dabbreviate ),,( 111 nnn xXxXPxxP
)(in variables theof valuesspecific theis )( where
)),(|(),,(1
1
ii
n
i
iin
XParentsXparents
XparentsxPxxP
rulechain by ),,|(),,( 11
1
1 xxxPxxP ii
n
i
n
network, in the iX
},,{)( that provided ))(|(),,|( 1111 XXXParentsXParentsXPXXXP iiiiii
KAIST AIPR Lab.
Semantics of Bayesian Network (cont’d)
• Compactness
Locally structured system
• Interacts directly with only a bounded number of components
Complete network specified by conditional probabilities
where at most k parents
• Node ordering
Add “root causes” first
Add variables influenced, and so on
Until reach the “leaves”
• “Leaves”: no direct causal influence on others
7
kn2
KAIST AIPR Lab.
Three example of 3-node graphs
Tail-to-Tail Connection
• Node c is said to be tail-to-tail
• When node c is observed,
Node c blocks the path from a to b
Variables a and b are independent
8
c
ba 0
)()|()|(),(
b|a
cPcbPcaPbaPc
c
ba
)|()|()(
),,()|,(
cb|a
cbPcaPcP
cbaPcbaP
KAIST AIPR Lab.
Three example of 3-node graphs
Head-to-Tail Connection
9
• Node c is said to be head-to-tail
• When node c is observed,
Node c blocks the path from a to b
Variables a and b are independent
c ba0
)|()()|()|()(),(
b|a
abPaPcbPacPaPbaPc
c ba
)|()|()(
)|()|()(
)(
),,()|,(
cb|a
cbPcaPcP
cbPacPaP
cP
cbaPcbaP
KAIST AIPR Lab.
Three example of 3-node graphs
Head-to-Head Connection
10
• Node c is said to be head-to-head
• When node c is unobserved,
Node c blocks the path from a to b
Variables a and b are independent
c
ba
0
)()(),|()()(),,(),,(
),|()()(),,(
b|a
bPaPbacPbPaPbaPcbaP
bacPbPaPcbaP
c c
)(
),|()()(
)(
),,()|,(
cb|a
cP
bacPbPaP
cP
cbaPcbaP
c
ba
KAIST AIPR Lab.
D-separation
• Let A, B, and C be arbitrary nonintersecting sets of nodes
• Paths from A to B is blocked if it includes either,
Head-to-tail or tail-to-tail node, and node is in C
Head-to-head node, and node and its descendants is not in C
• A is d-separated from B by C if,
Any node in possible paths from A to B blocks the path
11
a
e
c
f
b
a
e
c
f
b
cba | fba |
KAIST AIPR Lab.
Conditional Independence Relations
• Conditionally independent of
its non-descendants, given its
parents
• Conditionally independent of
all other nodes, given its
Markov blanket*
• In general, d-separation is used for
deciding independence
12
X
U1 Um
Z1j
Y1
Znj
Yn
X
U1 Um
Z1j
Y1
Znj
Yn
* Parents, children, and children’s other parents
KAIST AIPR Lab.
Probabilistic Inference In Bayesian Networks
• Notation
X: the query variable
E: the set of evidence variables, E1,…,Em
e: particular observed evidences
• Compute posterior probability distribution
• Exact inference
Inference by enumeration
Variable elimination algorithm
• Approximate inference
Direct sampling methods
Markov chain Monte Carlo (MCMC) algorithm
13
)|( eXP
KAIST AIPR Lab.
Exact Inference In Bayesian Networks
Inference By Enumeration
•
• Recall,
• Computing sums of products of conditional probabilities
• In Burglary example,
• O(2n) time complexity for n Boolean variables
14
iablehidden var isy where),,(),()|( y
yeXPeXPeXP
))(|(),,(1
1
n
i
iin XparentsxPxxP
e a
e a
e a
amPajPebaPePbP
amPajPebaPePbPmjbP
mjaeBPmjBPmjBP
)|()|(),|()()(
)|()|(),|()()(),|(
),,,,(),,(),|(
M
B
J
E
A
KAIST AIPR Lab.
Exact Inference In Bayesian Networks
Variable Elimination Algorithm
• Eliminating repeated calculations of Enumeration
15
Repeated calculations
e a
amPajPeBaPEPBPmjBP )|()|(),|()()(),|(
KAIST AIPR Lab.
Exact Inference In Bayesian Networks
Variable Elimination Algorithm (cont’d)
• Evaluating in right-to-left order (bottom-up)
• Each part of the expression makes factor
• Pointwise product
16
e a
amPajPeBaPEPBPmjBP )|()|(),|()()(),|(
ajP
ajPA
amP
amPA JM
|(
)|()(f,
|(
)|()(f
)(f)(f),|(
),(f)(f)(f
)(f)(f),,(f),(f
)|()|(
)|()|()(f
BBmjBP
eBeB
aaEBaEB
amPajP
amPajPA
JMAEB
eJMAEJMAE
a
MJAJMA
JM
M
B
J
E
A
KAIST AIPR Lab.
Exact Inference In Bayesian Networks
Variable Elimination Algorithm (cont’d)
• Repeat removing any leaf node that is not a query variable or
an evidence variable
• In Burglary example,
• Time and space complexity
Dominated by the size of the largest factor
In the worst case, exponential time and space complexity
17
)|( trueBJP
M
B
J
E
A
ae
a me
aJPebaPePbP
amPaJPebaPePbPbJP
)|(),|()()(
)|()|(),|()()()|(
KAIST AIPR Lab.
Approximate Inference In Bayesian Networks
Direct Sampling Methods
• Generating of samples from known probability distribution
• Sample each variable in topological order
•
• : the probability of specific event from Prior-Sample
18
Function Prior-Sample(bn) returns an event sampled from the prior specified by bn
inputs: bn, a Bayesian network specifying joint distribution P(X1,…,Xn)
x ← an event with n elementsfor i=1 to n do
xi ← a random sample from P(Xi | parents(Xi))
return x
),...,( 1 nPS xxS
nn
nnPSnPS
N
n
n
i
iinPS
xx),...,xN(x
xxPxxSN
xxN
xxPXparentsxPxxS
,,event theoffrequency theis e wher
),,(),...,(),...,(
lim
),,())(|(),...,(
11
111
1
1
1
(Consistent estimate)
KAIST AIPR Lab.
Approximate Inference In Bayesian Networks
Rejection Sampling Methods
• Rejecting samples that is inconsistent with evidence
• Estimate by counting how often occurs
• Rejects samples exponentially as the number of evidence
variables grows
19
)|()(
),(
)(
),(),()|(ˆ
eXPeP
eXP
eN
eXNeXNeXP
PS
PSPS
xX
(Consistent estimate)
KAIST AIPR Lab.
Approximate Inference In Bayesian Networks
Likelihood weighting
• Generating only consistent events w.r.t. the evidence
Fixes the values for the evidence variables E
Samples only the remaining variables X and Y
•
20
function Likelihood-Weighting(X, e, bn, N) returns an estimate of P(X|e)
local variables: W, a vector of weighted counts over X, initially zerofor i=1 to N do
x, w ← Weighted-Sample(bn, e)
W[x] ← W[x]+w where x is the value of X in x
Return Normalize(W[X])
function Weighted-Sample(bn, e) returns an event and a weightx ← an event with n elements; w ← 1
for i=1 to n doif Xi has a value xi in e
then w ← w
else xi ← a random sample fromreturn x, w
))(|( iii XparentsxXP
))(|( ii XparentsXP
KAIST AIPR Lab.
Approximate Inference In Bayesian Networks
Likelihood weighting (cont’d)
• Sampling distribution SWS by Weighted-Sample
• The likelihood weight w(z,e)
• Weighted probability of a sample
21
l
i
iiWS ZparentszPezS1
Y{X} Z where))(|(),(
m
i
ii EparentsePezw1
))(|(),(
),(
)(|())(|(),(),(11
ezP
EparentsePZparentszPezwezSm
i
ii
l
i
iiWS
KAIST AIPR Lab.
Approximate Inference In Bayesian Networks
Markov Chain Monte Carlo Algorithm
• Generating event by random change to one of nonevidence
variables Zi
• Zi conditioned on current values in the Markov blanket of Zi
• State specifying a value for every variables
• Long-run fraction of time spent in each state
•
22
function MCMC-Ask(X, e, bn, N) returns an estimate of P(X|e)
local variables: N[X], a vector of counts over X, initially zeroZ, the nonevidence variables in bn
x, the current state of the network, initially copied from einitialize x with random values for the variables in Zfor j=1 to N do
for each Zi in Z dosample the value of Zi in x from given the values of in xN[x]←N[x] + 1 where x is the value of X in x
return Normalize(N[X])
))(|( ii ZmbZP )( iZmb
)|( eXP
KAIST AIPR Lab.
Approximate Inference In Bayesian Networks
Markov Chain Monte Carlo Algorithm (cont’d)
• Markov chain on the state space
• Consistency
Markov chain reached its stationary distribution if it has detailed
balance
23
x state x tostate fromn transitioofy probabilit the:)( xxq
sampler Gibbs called ),,|()),(),(()( exxPxxxxqxxq iiiiii
ii XX other than iableshidden var theall be Let
KAIST AIPR Lab.
Summary
• Bayesian network
Directed acyclic graph expressing causal relationship
• Conditional independence
D-separation property
• Inference in Bayesian network
Enumeration: intractable
Variable elimination: efficient, but sensitive to topology
Direct sampling: estimate posterior probabilities
MCMC algorithm: powerful method for computing with
probability models
24
KAIST AIPR Lab.
References
[1] Stuart Russell et al., “Probabilistic Reasoning”, Artificial
Intelligence A Modern Approach, Chapter 14, pp.492-519
[2] Eugene Charniak, "Bayesian Networks without Tears", 1991
[3] C. Bishop, “Graphical Models”, Pattern Recognition and
Machine Learning, Chapter 8, pp.359-418
25
KAIST AIPR Lab.
Q&A
26
• Thank you
KAIST AIPR Lab.
Appendix 1. Example of Bad Node Ordering
27
MaryCallsJohnCalls
Alarm
Burglary Earthquake
① ②
③
④ ⑤
• Two more links and unnatural probability judgments
KAIST AIPR Lab.
Appendix 2. Consistency of Likelihood Weighting
•
28
)|(
),('
),,('
),,(),,('
),,(),,()|(ˆ
e
e
ey
eyey
eyeye
y
y
y
xP
xP
xP
xwxS
xwxNxP
WS
WS
from Likelihood-Weighting
for large N
(Consistent estimate)
KAIST AIPR Lab.
Appendix 2. State Distribution of MCMC
• Detailed balance
• Gibbs sampler,
• Stationary distribution if
29
xx, xxqxxxqx
tx(x)πt
allfor )()()()(
at time statein being system ofy probabilit thebe Let
1 tt
)()()(
)()()()()(1
xxxqx
xxqxxxqxx
x
xx
t
),|()),(),(()( exxPxxxxqxxq iiiiii
)x(x)xq(
rulechain backwardsby )|,(),|(
)|,(on rulechain by ),|()|(),|(
),|()|,(),|()|()()(
exxPexxP
exxPexxPexPexxP
exxPexxPexxPexPxxqx
iiii
iiiiiii
iiiiii