1 Bayes Nets Representing and Reasoning about Uncertainty (Continued) Combining the Two Examples • I am at work, my neighbor John calls to say that my alarm went off, my neighbor Mary doesn’t call. Sometimes the alarm is set off by a minor earthquake. Is there a burglar? Burglary Earthquake JohnCalls MaryCalls Alarm
31
Embed
Combining the Two Examples...Combining the Two Examples • I am at work, my neighbor John calls to say that my alarm went off, my neighbor Mary doesn’t call. Sometimes the alarm
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Bayes NetsRepresenting and Reasoning
about Uncertainty(Continued)
Combining the Two Examples• I am at work, my neighbor John calls to say that
my alarm went off, my neighbor Mary doesn’t call. Sometimes the alarm is set off by a minor
earthquake. Is there a burglar?
Burglary Earthquake
JohnCalls MaryCalls
Alarm
2
Earthquake Example• I am at work, my neighbor John calls to say that
my alarm went off, neighbor Mary doesn’t call. Sometimes the alarm is set off by a minor
earthquake. Is there a burglar?
Burglary Earthquake
JohnCalls MaryCalls
Alarm
1: Define the variables that completely describe the problem.
Earthquake Example• I am at work, my neighbor John calls to say that
my alarm went off, neighbor Mary doesn’t call. Sometimes the alarm is set off by a minor
earthquake. Is there a burglar?
Burglary Earthquake
JohnCalls MaryCalls
Alarm
3
Earthquake Example• I am at work, my neighbor John calls to say that
my alarm went off, neighbor Mary doesn’t call. Sometimes the alarm is set off by a minor
earthquake. Is there a burglar?
Burglary Earthquake
JohnCalls MaryCalls
Alarm
2: Define the links between variables.• The resulting directed graph must be acyclic• If node X has parents Y1,..,Yn, any variable that is not a descendent of X is conditionally independent of X given (Y1,..,Yn)
Earthquake Example• I am at work, my neighbor John calls to say that
my alarm went off, neighbor Mary doesn’t call. Sometimes the alarm is set off by a minor
earthquake. Is there a burglar?
Burglary Earthquake
JohnCalls MaryCalls
AlarmP(B=true) = 0.001
P(E=true) = 0.002
4
Earthquake Example• I am at work, my neighbor John calls to say that
my alarm went off, neighbor Mary doesn’t call. Sometimes the alarm is set off by a minor
earthquake. Is there a burglar?
Burglary Earthquake
JohnCalls MaryCalls
Alarm
P(B=True) = 0.001
P(E=true) = 0.002
0.001F F
0.29F T
0.94T F
0.95T T
P(A = True|B=b,E=e)B E
0.05F
0.90T
P(J = True|A=a)A
0.01F
0.70T
P(M = True|A=a)A
Earthquake Example• I am at work, my neighbor John calls to say that
my alarm went off, neighbor Mary doesn’t call. Sometimes the alarm is set off by a minor
earthquake. Is there a burglar?
Burglary Earthquake
JohnCalls MaryCalls
Alarm
P(B=True) = 0.001
P(E=true) = 0.002
0.001F F
0.29F T
0.94T F
0.95T T
P(A = True|B=b,E=e)B E
0.05F
0.90T
P(J = True|A=a)A
0.01F
0.70T
P(M = True|A=a)A
3: Add a probability table for each node. The table for node X contains P(X|Parent Values) for
each possible combination of parent values
5
Computing a Joint Entry• Any entry in the joint probability table can be
computed: Probability that both John and Mary calls, the alarm goes off, but there is no earthquake
or burglar.
Burglary Earthquake
JohnCalls MaryCalls
Alarm
P(B=True) = 0.001
P(E=true) = 0.002
0.001F F
0.29F T
0.94T F
0.95T T
P(A = True|B=b,E=e)B E
0.05F
0.90T
P(J = True|A=a)A
0.01F
0.70T
P(M = True|A=a)A
Computing a Joint EntryP(J ^ M ^ A ^ ¬B ^ ¬ E) = P(J | M ^ A ^ ¬ B ^ ¬ E) P(M ^ A ^ ¬ B ^ ¬ E)
= P(J | A) P(M ^ A ^ ¬ B ^ ¬ E)
= P(J | A) P(M | A ^ ¬ B ^ ¬ E)
= P(J | A) P(M | A) P(A| ¬ B ^ ¬ E) P(¬ B ^ ¬ E)
= P(J | A) P(M | A) P(A| ¬ B ^ ¬ E) P(¬ B) P(¬ E)
= 0.90 x 0.70 x 0.001 x 0.999 x 0.998 = 0.0006
Burglary Earthquake
JohnCalls MaryCalls
Alarm
P(B=True) = 0.001
P(E=true) = 0.002
0.001F F
0.29F T
0.94T F
0.95T T
P(A = True|B=b,E=e)B E
0.05F
0.90T
P(J = True|A=a)A
0.01F
0.70T
P(M = True|A=a)A
6
Computing a Joint EntryP(J ^ M ^ A ^ ~B ^ ~E) = P(J | M ^ A ^ ~B ^ ~E) P(M ^ A ^ ~B ^ ~E)
= P(J | A) P(M ^ A ^ ~B ^ ~E)
= P(J | A) P(M | A ^ ~B ^ ~E)
= P(J | A) P(M | A) P(A| ~B ^ ~E) P(~B ^ ~E)
= P(J | A) P(M | A) P(A| ~B ^ ~E) P(~B) P(~E)
= 0.90 x 0.70 x 0.001 x 0.999 x 0.998 = 0.0006
Burglary Earthquake
JohnCalls MaryCalls
Alarm
P(B=True) = 0.001
P(E=true) = 0.002
0.001F F
0.29F T
0.94T F
0.95T T
P(A = True|B=b,E=e)B E
0.05F
0.90T
P(J = True|A=a)A
0.01F
0.70T
P(M = True|A=a)A
We would need 25 entries to store the entire joint
distribution table.
But we need to store only 10 values by representing the dependencies between variables
Inference• Any inference operation of the form P(values of
some variables | values of the other variables) can be computed: Probability that both John and Mary
call given that there was a burglar.
Burglary Earthquake
JohnCalls MaryCalls
Alarm
P(B=True) = 0.001
P(E=true) = 0.002
0.001F F
0.29F T
0.94T F
0.95T T
P(A = True|B=b,E=e)B E
0.05F
0.90T
P(J = True|A=a)A
0.01F
0.70T
P(M = True|A=a)A
7
Inference
Burglary Earthquake
JohnCalls MaryCalls
Alarm
P(B=True) = 0.001 P(E=True) = 0.002
0.001F F
0.29F T
0.94T F
0.95T T
P(A = True|B=b,E=e)B E
0.05F
0.90T
P(J = True|A=a)A
0.01F
0.70T
P(M = True|A=a)A
∑
∑==
BY
BMJX
YP
XP
BP
BMJPBMJP
contain that entries All
^^contain that entries All
)(
)(
)(
),,()|,(
Inference
Burglary Earthquake
JohnCalls MaryCalls
Alarm
P(B=True) = 0.001 P(E=true) = 0.002
0.001F F
0.29F T
0.94T F
0.95T T
P(A = True|B=b,E=e)B E
0.05F
0.90T
P(J = True|A=a)A
0.01F
0.70T
P(M = True|A=a)A
∑
∑==
BY
BMJX
YP
XP
BP
BMJPBMJP
contain that entries All
^^contain that entries All
)(
)(
)(
),,()|,(
We know how to compute these sums
because we know how to compute the
joint � as written, we still need to
compute most of the entire joint table
8
Bayes Net: Formal Definition
• Bayes Net = directed acyclic graph represented by:
– Set of vertices V
– Set of directed edges E joining vertices. No cycles are
allowed.
• With each vertex is associated:
– The name of a random variable
– A probability distribution table indicating how the
probability of the variable’s values depends on all the
– Set of directed edges E joining vertices. No cycles are
allowed.
• With each vertex is associated:
– The name of a random variable
– A probability distribution table indicating how the
probability of the variable’s values depends on all the
possible combinations of values of its parents
Bayes Nets are also called Belief NetworksThe tables associated with the vertices are called
Conditional Probability Tables (CPT)
All the definitions can be extended to using continuous
random variables instead of discrete variables
9
Bayes Net Construction
• Choose a set of variables and an ordering {X1,..,Xm}
• For each variable Xi for i = 1 to m:
1. Add the variable Xi to the network
2. Set Parents(Xi) to be the minimal subset of {X1,..,Xi-1} such that Xi is conditionally independent of all the other members of {X1,..,Xi-1} given Parents(Xi)
3. Define the probability table describing
P(Xi | Parents(Xi))
Bayes Net Construction
• Choose a set of variables and an ordering {X1,..,Xm}
• For each variable Xi for i = 1 to m:
1. Add the variable Xi to the network
2. Set Parents(Xi) to be the minimal subset of {X1,..,Xi-1} such that Xi is conditionally independent of all the other members of {X1,..,Xi-1} given Parents(Xi)
3. Define the probability table describing
P(Xi | Parents(Xi))
If Xi has k parents, we need to store 2k
entries to represent the CPT � Storage is
exponential in the number of parents, not in
the total number of variables m. In many
problems k << m.
The structure of the
network depends on the
initial ordering of the
variables
10
Example: Symptoms & Diagnosis
• The diagnosis problem: What is the most
likely disease given observed symptoms
• Variables V = {Flu, Measles, Fever, Spots}
Flu Measles
Fever Spots
Try creating the network by using a different ordering of the variables…..
Another Examples• The lawn may be wet because the sprinkler
was on or because it was raining (or both).
Cloudy
Rain
Wet Grass
Sprinkler
0.01F F
0.90F T
0.90T F
0.99T T
P(W = True)S R
0.20F
0.80T
P(R = True)C
0.50F
0.10T
P(S = True)CP(C) = 0.5
11
Computing the Joint: The General Case
• Any entry in the joint distribution table can be computed
• Consequently, any conditional probability can be computed
∏
∏
=
=
−−
−−
−−−−
−−
−−
−−
=
=====
====
×====
×====
====
×====
====
m
i
iii
m
i
iiii
mm
mmmm
mmmm
mm
mmmm
mm
XxXP
xXxXxXxXP
xXxXxXP
xXxXxXxXP
xXxXxXxXP
xXxXxXP
xXxXxXxXP
xXxXxXP
1
1
112211
222211
22221111
112211
112211
112211
2211
))(Parents tosassignment|(
),,|(
),.(
),,|(
),,|(
),,(
),,|(
),,(
K
M
K
K
K
K
K
K
∏
∏
=
=
−−
−−
−−−−
−−
−−
−−
=
=====
====
×====
×====
====
×====
====
m
i
iii
m
i
iiii
mm
mmmm
mmmm
mm
mmmm
mm
XxXP
xXxXxXxXP
xXxXxXP
xXxXxXxXP
xXxXxXxXP
xXxXxXP
xXxXxXxXP
xXxXxXP
1
1
112211
222211
22221111
112211
112211
112211
2211
))(Parents tosassignment|(
),,|(
),.(
),,|(
),,|(
),,(
),,|(
),,(
K
M
K
K
K
K
K
K
Computing the Joint: The General Case
• Any entry in the joint distribution table can be computed
• Consequently, any conditional probability can be computed
We can do this because, by
construction, Xi is independent of all
the other variables given Parents(Xi)
12
Inference: The General Case
• Inference = Computing a conditional probability:
P(Value for some variable(s) | Values for other variables)
Inference: The General Case
• Inference = Computing a conditional probability:
P(Value for some variable(s) | Values for other variables)
)|( 21 EEP
“Query” variables
Example: Disease
“Evidence” variables
Example: Symptoms
Also called “belief updating”
13
Inference: The General Case
• Inference = Computing a conditional probability:
P(Value for some variable(s) | Values for other variables)
∑
∑==
2
21
contain that entriesjoint All
^contain that entriesjoint All
2
2121
)(
)(
)(
),()|(
EY
EEX
YP
XP
EP
EEPEEP
We can compute any conditional probability so we can perform solve any inference problem in principle
So Far…
• Methodology for building Bayes nets.
• Requires exponential storage in the maximum
number of parents of any node, not in the total number of nodes.
• We can compute the value of any assignment to the variables (entry in the joint distribution) in time linear in the number of variables.
• We can compute the answer to any question (any conditional probability)
14
Inference: The General Case
• Inference = Computing a conditional probability:
P(Value for some variable(s) | Values for other variables)
We can compute any conditional probability so we can perform solve any inference problem in principle
∑
∑==
2
21
contain that entriesjoint All
^contain that entriesjoint All
2
2121
)(
)(
)(
),()|(
EY
EEX
YP
XP
EP
EEPEEP
Problem: if E2 involves k binary variables and we have a total of
m variables, what is the
complexity of this computation?
Inference: The Bad News
• Computing the conditional probabilities by
enumerating all relevant entries in the joint
is expensive:
Exponential in the number of variables!
• Even worse:
Solving for general queries in Bayes nets is
NP-hard!
15
Possible Solutions
• Approximate methods
– Approximate the joint distributions by drawing samples
• Exact methods
– Factorization and variable elimination
– Exploit special network structure (e.g., trees)
– Transform the network structure
Approximate Method: Sampling• Sampling = Very powerful technique in many
probabilistic problems
• General idea:
– It is often difficult to compute and represent exactly the
probability distribution of a set of variables
– But, it is often easy to generate examples from the distribution
……………….…….
0.001F F…T
0.29F T…T
0.94T F…T
0.95T T…T
P(X1=x1,X2=x2 ,…,Xm = xm)x1 x2...xm
The
nu
mbe
r of
row
s t
oo
la
rge
fo
r th
e t
ab
le t
o b
e
co
mp
ute
d e
xp
licitly
16
Approximate Method: Sampling• Sampling = Very powerful technique in many
probabilistic problems (stochastic simulation)
• General idea:
– It is often difficult to compute and represent exactly the
probability distribution of a set of variables
– But, it is often easy to generate examples from the distribution
x1 x2.......................xm
F F F T F T…F T
…….
F T T F F F…T F
T F T T T F…T T
T T F T F F…T FP
For a large number of samples,
P(X1=x1,X2=x2,…,Xm = xm)
is approximately equal to:
# of samples with
X1=x1 and X2=x2 …and Xm = xm
Total # of samples
Sampling Example• Generate a set of variable assignments with the same
distribution as the joint distribution represented by the network
Cloudy
Rain
Wet Grass
Sprinkler
0.01F F
0.90F T
0.90T F
0.99T T
P(W = True)S R
0.20F
0.80T
P(R = True)C
0.50F
0.10T
P(S = True)CP(C) = 0.5
17
Cloudy
Rain
Wet Grass
Sprinkler
0.01F F
0.90F T
0.90T F
0.99T T
P(W = True)S R
0.50F
0.10T
P(S = True)C
P(C) = 0.5
Sampling1. Randomly choose C.
C = True with probability 0.5 � C = True
2. Randomly choose S. S = True with probability 0.10 �
3. Randomly choose R. R = True with probability 0.80 �
4. Randomly choose W. W = True with probability 0.90 �
T
WRSC
0.20F
0.80T
P(R = True)C
Cloudy
Rain
Wet Grass
Sprinkler
0.01F F
0.90F T
0.90T F
0.99T T
P(W = True)S R
0.50F
0.10T
P(S = True)C
P(C) = 0.5
Sampling1. Randomly choose C.
C = True with probability 0.5 � C = True
2. Randomly choose S. S = True with probability 0.10 � S = False
3. Randomly choose R. R = True with probability 0.80 � R = True
4. Randomly choose W. W = True with probability 0.90 � W = True
FT
WRSC
0.20F
0.80T
P(R = True)C
18
Cloudy
Rain
Wet Grass
Sprinkler
0.01F F
0.90F T
0.90T F
0.99T T
P(W = True)S R
0.20F
0.80T
P(R = True)C
0.50F
0.10T
P(S = True)C
P(C) = 0.5
Sampling1. Randomly choose C.
C = True with probability 0.5 � C = True
2. Randomly choose S. S = True with probability 0.10 � S = False
3. Randomly choose R. R = True with probability 0.80 � R = True
4. Randomly choose W. W = True with probability 0.90 � W = True
TFT
WRSC
Cloudy
Rain
Wet Grass
Sprinkler
0.01F F
0.90F T
0.90T F
0.99T T
P(W = True)S R
0.20F
0.80T
P(R = True)C
0.50F
0.10T
P(S = True)C
P(C) = 0.5
Sampling1. Randomly choose C.
C = True with probability 0.5 � C = True
2. Randomly choose S. S = True with probability 0.10 � S = False
3. Randomly choose R. R = True with probability 0.80 � R = True
4. Randomly choose W. W = True with probability 0.90 � W = True
TTFT
WRSC
19
Sampling for Inference: Example• Suppose that we want to compute P(W = True | C = True)
(In words: How likely is it that the grass will be wet given that the sky is cloudy)
• Compute lots of samples of (C,S,R,W)– Nc = Number of samples for which C = True
– Ns = Number of samples for which W = True and C = True
– N = Total number of samples
• Nc/N approximates P(C = True)
• Ns/N approximates P(W = True and C = True)
• Therefore:Ns/Nc approximates:
P(W = True and C = True)/ P(C = True) =
P(W = True | C = True)
Sampling for Inference: General Case• Suppose that we want to compute P(E1| E2) (In words: How
likely is it that the variable assignments in E1 are satisfied given the assignments in E2)
• Compute lots of samples– Nc = Number of samples for which the assignments in E2 are satisfied
– Ns = Number of samples for which the assignments in E1 are satisfied
– N = Total number of samples
• Nc/N approximates P(E2)
• Ns/N approximates P(E1 and E2)
• Therefore:Ns/Nc approximates:
P(E1 and E2)/ P(E2) = P(E1|E2)
20
Problem with Sampling• Probability is so low for some assignments of variables
that that will likely never be seen in the samples (unless a very large number of samples is drawn).