Bayesian Networks Unit 7 Approximate Inference in Bayesian Networks Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright Wang, Yuan-Kai, 王元凱 [email protected]http://www.ykwang.tw Department of Electrical Engineering, Fu Jen Univ. 輔仁大學電機工程系 2006~2011 Reference this document as: Wang, Yuan-Kai, “Approximate Inference in Bayesian Networks," Lecture Notes of Wang, Yuan-Kai, Fu Jen University, Taiwan, 2011.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Bayesian Networks
Unit 7 Approximate Inference in Bayesian Networks
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Department of Electrical Engineering, Fu Jen Univ.輔仁大學電機工程系
2006~2011
Reference this document as: Wang, Yuan-Kai, “Approximate Inference in Bayesian Networks," Lecture Notes of Wang, Yuan-Kai, Fu Jen University, Taiwan, 2011.
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 2
Goal of This Unit• P(X|e) inference for Bayesian networks• Why approximate inference
– Exact inference is too slow because of exponential complexity
• Using approximate approaches– Sampling methods
• Likelihood weighting sampling• Markov Chain Monte Carlo sampling
– Loopy belief propagation– Variational method
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p.
Related Units• Background
– Probabilistic graphical model– Exact inference in BN
• Next units– Probabilistic inference over time
3
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 4
Self-Study References• Chapter 14, Artificial Intelligence-a modern
approach, 2nd, by S. Russel & P. Norvig, Prentice Hall, 2003.
• Inference in Bayesian networks, B. D’Ambrosio, AI Magazine, 1999.
• Probabilistic Inference in graphical models, M. I. Jordan & Y. Weiss.
• An introduction to MCMC for machine learning. Andrieu, C., De Freitas, J., Doucet, A., & Jordan, M. I., Machine Learning, vol. 50, pp.5-43, 2003.
• Computational Statistics Handbook with Matlab, W. L. Martinez and A. R. Martinez, Chapman & Hall/CRC, 2002– Chapter 3 Sampling Concepts– Chapter 4 Generating Random Variables
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 5
Query: What is the probability that a student studied, given that they pass the exam?
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 109
Analysis (1/3)• Why the algorithm works? P(X|E=e)• Let the sampling probability for
WEIGHTED-SAMPLE be SWS–The evidence variables E are fixed
with e–All the other variables Z = {X} Y–The algorithm samples each variable
in Z given its parent values
l
iiiWS ZparentszPezS
1
))(|(),(
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 110
Analysis (2/3)• The likelihood weight w for a given
sample (z, e)=(x, y, e) is
• The weighted probability of a sample (z,e)=(x, y, e) is
m
iii EparentsePezw
1
))(|(),(
),,(
))(|())(|(
),(),(
11eyxP
EparentsePZparentszP
ezwezSm
iii
l
iii
WS
n
iiin XparentsxPxxP
11 ))(|(),,(
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 111
Analysis (3/3)
y
WS eyxweyxNexP ),,(),,()|(ˆ
y
WS eyxweyxS ),,(),,('
)|(),(' exPexP
y
eyxP ),,('
So the algorithm works
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 112
Discussions• Likelihood weighting is efficient
because it uses all the samples generated
• However, it suffers a degradation in performance as the no. of evidence variables increases, because –Most samples will have very low weights,–The weighted estimate will be dominated
by the tiny fraction of samples that have infinitesimal likelihood
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 113
4. Inference by MCMC• Key idea
– Sampling process as a Markov Chain• Next sample depends on the previous one
– Approximate any posterior distribution• "State" of network
= current assignment to all variables• Generate next state
– by sampling one variable given Markov blanket
• Sample each variable in turn, keeping evidence fixed
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 114
The Markov Chain• With Sprinkler =true, WetGrass=true,
there are four states:
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 115
Markov Blanket Sampling• Markov blanket of Cloudy is
–Sprinkler and Rain• Markov blanket of Rain is
–Cloudy, Sprinkler, and WetGrass• Probability given the Markov
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 118
The Algorithm
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 119
Why it works• Skipped
–Details in pp. 517-518 in the AIMA 2e textbook
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 120
Sub-Sections• 4.1 Markov chain theory• 4.2 Two MCMC sampling algorithms
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 121
4.1 Markov Chain Theory• Suppose X1, X2, … take some set of values
– wlog. These values are 1, 2, ...• A Markov chain is a process that corresponds
to the network:
• To quantify the chain, we need to specify– Initial probability: P(X1)– Transition probability: P(Xt+1|Xt)
• A Markov chain has stationary transition probability: P(Xt+1|Xt) same for all times t
X1 X2 X3 Xn... ...
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 122
Irreducible Chains
• A state j is accessible from state i if there is an n such that P(Xn = j | X1 = i) > 0– There is a positive probability of reaching j from i after some number steps
• A chain is irreducible if every state is accessible from every state
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 123
Ergodic Chains• A state is positively recurrent if there is a
finite expected time to get back to state iafter being in state i– If X has finite number of states, then this is
suffices that i is accessible from itself
• A chain is ergodic if it is irreducible and every state is positively recurrent
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 124
(A)periodic Chains• A state i is periodic if there is an integer d such that when n is not divisible by d
P(Xn = i | X1 = i ) = 0• Intuition: only every d steps state i may
occur • A chain is aperiodic if it contains no
periodic state
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 125
Stationary ProbabilitiesThm:• If a chain is ergodic and aperiodic, then
the limitexists, and does not depend on i
• Moreover, letthen, P*(X) is the unique probability satisfying
)|(lim 1 iXXP nn
)|(lim)( 1* iXjXPjXP nn
i
tt iXPiXjXPjXP )()|()( *1
*
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 126
Stationary Probabilities• The probability P*(X) is the stationary
probability of the process• Regardless of the starting point, the
process will converge to this probability
• The rate of convergence depends on properties of the transition probability
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 127
Sampling from the Stationary Probability
• This theory suggests how to sample from the stationary probability:– Set X1 = i, for some random/arbitrary i– For t = 1, 2, …, n
•Sample a value xt+1 for Xt+1 from P(Xt+1|Xt=xt)
– return xn• If n is large enough, then this is a sample
from P*(X)
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 128
Designing Markov Chains• How do we construct the right chain to
sample from?– Ensuring aperiodicity and irreducibility is
usually easy
• Problem is ensuring the desired stationary probability
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 129
Designing Markov ChainsKey tool:• If the transition probability satisfies
then, P*(X) = Q(X)• This gives a local criteria for checking
that the chain will have the right stationary distribution
0)|1(whenever)()(
)|()|(
1
1
itXjtXPiXQjXQ
jXiXPiXjXP
tt
tt
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 130
MCMC Methods• We can use these results to sample from P(X1,…,Xn|e)
Idea:• Construct an ergodic & aperiodic
Markov Chain such that P*(X1,…,Xn) = P(X1,…,Xn|e)
• Simulate the chain n steps to get a sample
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 131
MCMC MethodsNotes:• The Markov chain variable Y takes as
value assignments to all variables that are consistent evidence
• For simplicity, we will denote such a state using the vector of variables
}satisfy,...,|)()(,...,{)( 1111 enn xxXVXVxxYV
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 132
4.2 Two MCMC Sampling Algorithms
• Gibbs Sampler• Metropolis-Hastings Sampler
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 133
Gibbs Sampler• One of the simplest MCMC method• Each transition changes the state of one Xi
• The transition probability defined by P itself as a stochastic procedure:– Input: a state x1,…,xn– Choose i at random (uniform probability)– Sample x’i from P(Xi|x1, …, xi-1, xi+1 ,…, xn, e)
– let x’j = xj for all j i– return x’1,…,x’n
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 134
Correctness of Gibbs Sampler• How do we show correctness?
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 135
Correctness of Gibbs Sampler• By chain ruleP(x1,…,xi-1, xi, xi+1,…,xn|e) =P(x1,…,xi-1, xi+1,…,xn|e)P(xi|x1,…,xi-1, xi+1,…,xn, e)
• Thus, we get
• Since we choose i from the same distribution at each stage, this procedure satisfies the ratio criteria
),,,,,,|'(),,,,,,|(
)|,,,',,,()|,,,,,,(
111
111
111
111ee
ee
niii
niii
niii
niiixxxxxPxxxxxP
xxxxxPxxxxxP
Transition
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 136
Gibbs Sampling for Bayesian Network
• Why is the Gibbs sampler “easy” in BNs?• Recall that the Markov blanket of a
variable separates it from the other variables in the network– P(Xi | X1,…,Xi-1,Xi+1,…,Xn) = P(Xi | Mbi )
• This property allows us to use localcomputations to perform sampling in each transition
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 137
Gibbs Sampling in Bayesian Networks
• How do we evaluate P(Xi | x1,…,xi-1,xi+1,…,xn) ?
• Let Y1, …, Yk be the children of Xi– By definition of Mbi, the parents of Yj are
in Mbi{Xi}• It is easy to show that
i
j
j
x jyjii
jyjii
ii payPPaxP
payPPaxPMbxP
')|()|'(
)|()|()|(
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 138
Metropolis-Hastings• More general than Gibbs (Gibbs is a
special case of M-H)• Proposal distribution arbitrary q(x’|x)
that is ergodic and aperiodic (e.g., uniform)
• Transition to x’ happens with probability(x’|x)=min(1, P(x’)q(x|x’)/P(x)q(x’|x))
• Useful when computing P(x) infeasible• q(x’|x)=0 implies P(x’)=0 or q(x|x’)=0
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 139
Sampling Strategy• How do we collect the samples?Strategy I:• Run the chain M times, each for N steps
– each run starts from a different state points
• Return the last state in each run
M chains
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 140
Sampling StrategyStrategy II:• Run one chain for a long time• After some “burn in” period, sample
points every some fixed number of steps
“burn in” M samples from one chain
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 141
Comparing StrategiesStrategy I:
– Better chance of “covering” the space of pointsespecially if the chain is slow to reach stationarity
– Have to perform “burn in” steps for each chainStrategy II:
– Perform “burn in” only once– Samples might be correlated (although only weakly)
Hybrid strategy: – Run several chains, sample few times each– Combines benefits of both strategies
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 142
Short Summary -Approximate Inference
• Monte Carlo (sampling with positive and negative error) Methods:– Pos: Simplicity of implementation and
theoretical guarantee of convergence– Neg: Can be slow to converge and hard to
diagnose their convergence.• Variational Methods – Your presentation• Loopy Belief Propagation and Generalized
Belief Propagation -- Your presentation
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 143
Query: What is the probability that a student studied, given that they pass the exam?
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 144
Main Computational Problems1. Difficult to tell if convergence has
been achieved2. Can be wasteful if Markov
blanket is large– P(Xi|MB(Xi)) won't change much
(law of large numbers)
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 145
5. Loopy Belief Propagation• TBU
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 146
6. Variational Methods• TBU
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 147
7. Implementation by PNL
PNL GeNIeEnumeration v (Naïve)Variable EliminationBelief Propagation v (Pearl) v (Polytree)Junction Tree v v (Clustering)Direct Sampling v (Logic)Likelihood Sampling v(LWSampling) v(Likelihood
Thm: Given , finding an -relative error approximation is NP-hard
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 158
Complexity: Absolute error• Thm: If < 0.5, then finding an
estimate of P(X=x|e) with absulote error approximation is NP-Hard
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 159
Likelihood Weighting• Can we ensure that all of our sample
satisfy e?• One simple solution:
–When we need to sample a variable that is assigned value by e, use the specified value
• For example: we know Y = 1–Sample X from P(X)–Then take Y = 1
• Is this a sample from P(X,Y |Y = 1) ?
X Y
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 160
Likelihood Weighting• Problem: these samples of X from P(X)• Solution:
– Penalize samples in which P(Y=1|X) is small
• We now sample as follows:– Let x[i] be a sample from P(X)– Let w[i] be P(Y = 1|X = x [i])
X Y
i
i
iw
[i])x|XPiw)xXP
][
(][1|(
xY
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 161
Likelihood Weighting• Why does this make sense?• When N is large, we expect to sample NP(X = x) samples with x[i] = x
• Thus,
• When we normalize, we get approximation of the conditional probability
)1,(
)|1()(][,
YxXNP
xXYPxXNPwxixi
i
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 162
Samples:
B E A C R
Likelihood WeightingP(b) 0.03
P(e) 0.001
P(a)b e b e b e b e0.98 0.40.7 0.01
P(c)a
0.8 0.05
P(r)e e
0.3 0.001
b
Earthquake
Radio
Burglary
Alarm
Call
0.03
Weight
= r
a
= a
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 163
Samples:
B E A C R
Likelihood WeightingP(b) 0.03
P(e) 0.001
P(a)
b e b e b e b e0.98 0.40.7 0.01
P(c)
a a0.8 0.05
P(r)
e e0.3 0.001
eb
Earthquake
Radio
Burglary
Alarm
Call
0.001
Weight
= r = a
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 164
Samples:
B E A C R
Likelihood WeightingP(b) 0.03
P(e) 0.001
P(a)
b e b e b e b e0.98 0.40.7 0.01
P(c)
a a0.8 0.05
P(r)e e
0.3 0.001
eb
0.4
Earthquake
Radio
Burglary
Alarm
Call
Weight
= r = a
0.6a
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 165
Samples:
B E A C R
Likelihood WeightingP(b) 0.03
P(e) 0.001
P(a)
b e b e b e b e0.98 0.40.7 0.01
P(c)
a a0.8 0.05
P(r)
e e0.3 0.001
e cb
Earthquake
Radio
Burglary
Alarm
Call
0.05Weight
= r = a
a 0.6
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 166
Samples:
B E A C R
Likelihood WeightingP(b) 0.03
P(e) 0.001
P(a)b e b e b e b e0.98 0.40.7 0.01
P(c)a a
0.8 0.05
P(r)e e
0.3 0.001
e cb r
0.3
Earthquake
Radio
Burglary
Alarm
Call
Weight
= r = a
a 0.6 *0.3
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 167
Likelihood Weighting• Let X1, …, Xn be order of variables
consistent with arc direction• w = 1• for i = 1, …, n do
–if Xi = xi has been observed•w w* P(Xi = xi | pai )
–else•sample xi from P(Xi | pai )
• return x1, …,xn, and w
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 168
Importance Sampling• A method for evaluating expectation of f
under P(x), <f>P(X)• Discrete:• Continuous:
• If we could sample from P
dxxPxff
xPxff
XP
xXP
)()(
)()(
)(
)(
r
XP rxfR
f ])[(1)(
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 169
Importance SamplingA general method for evaluating <f>P(X) when we cannot sample from P(X).Idea: Choose an approximating distribution
Q(X) and sample from it
Using this we can now sample from Q and then
x XQx
XP XQXPxfdx
XQXQxPxfdxxPxfxf
)()( )(
)()()()()()()()()(
W(X)
M
m
M
mXP mwmxf
MmXf
Mxf
1 1)(
)(])[(1])[(1)(
If we could generate samples from P(X)
Now that we generate the samples from Q(X)
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 170
(Unnormalized) Importance Sampling
1. For m=1:MSample X[m] from Q(X)Calculate W(m) = P(X)/Q(X)
2. Estimate the expectation of f(X) using
Requirements: P(X)>0 Q(X)>0 (don’t ignore possible scenarios) Possible to calculate P(X),Q(X) for a specific X=x It is possible to sample from Q(X)
M
mXP mwmxf
Mxf
1)(
)(])[(1)(
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 171
Normalized Importance SamplingAssume that we cannot evaluate P(X=x) but can evaluate P’(X=x) = P(X=x)(ex., we can evaluate P(X) but not P(X|e) in a Bayesian network)We define w’(X) = P’(X)/Q(X). We can then evaluate :
and then:
In the last step we simply replace with the above equation
xx
XQαxP
XQXPXQXw )('
)()(')()('
)(
)(
)()(
)(
)(')(')(
)(')(1)()()(')(1
)()()()()()()(
XQ
XQXQ
x
xxXP
XwXwXf
XwXfα
dxXQXQxPxf
α
dxXQXQxPxfdxxPxfxf
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 172
Normalized Importance SamplingWe can now estimate the expectation of f(X) similarly to unnormalized importance sampling by sampling x[m] from Q(X) and then
(hence the name “normalized”)
M
m
M
mXP
mw
mwmxfxf
1
1)(
)('
)('])[()(
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
Bayesian Networks Unit - Approximate Inference in Bayesian Networks p. 173
Importance Sampling Weaknesses• Important to choose sampling
distribution with heavy tails– Not to “miss” large values of f
• Many-dimensional I-S:– “Typical set” of P may take a long time to
find, unless Q good approximation to P– Weights vary by factors exponential in N