Artificial Intelligence CS 165A Jan 17, 2019 Instructor: Prof. Yu-Xiang Wang ® Probabilistic Reasoning (Ch 14)
Artificial IntelligenceCS 165A
Jan 17, 2019
Instructor: Prof. Yu-Xiang Wang
® Probabilistic Reasoning (Ch 14)
June 3, 2008 Stat 111 - Lecture 6 - Probability 2
Announcements• Course website:
https://www.cs.ucsb.edu/~yuxiangw/classes/CS165A-2019winter/
• Homework 1 is posted in the Assignment subdirectory, • Due Jan 29. • Homework submission box: HFH-2108 (look for a label
with “Wang / 165A”)
Plan on “Reasoning with Uncertainty”
• Last time:– Probability theory
– Joint probabilities, marginals, conditionals.
– Independence
• Today– Conditional independences
– Directed Graphical models --- Bayesian Networks (This lecture)
• Next Tuesday– D-separation / Markov Blankets
– Undirected Graphical models --- Markov Random Field
3
4
Marginalization
• Given P(X, Y), derive P(X) (“marginalize away Y”)
å=Y
YXPXP ),()(
åå ==YY
YPYXPYXPXP )()|(),()(
• Equivalent: Given P(X|Y) and P(Y), derive P(X)
ò=Y
YXPXP ),()(
òò ==YY
YPYXPYXPXP )()|(),()(
or
or
5
Marginalization (cont.)
• Marginalization is a common procedure– E.g., used to normalize Bayes’ rule
)()()|()|(
DPHPHDPDHP =
Known
?
• By marginalization,P(D) = åH P(D, H) = åH P(D | H) P(H)
= P(D | H1) P(H1) + P(D | H2) P(H2) + … + P(D | HN) P(HN)
6
Computing probabilities
• From the joint distribution P(X1, X2, …, XN) we can compute P(some variables) by summing over all the other variables
• For example– P(X2, …, XN) = Si P(X1i, X2, …, XN)
– P(X1) = Si Sj … Sp P(X1, X2i, X3j, …, XNp)
• For binary variables, from P(X, Y)– P(Y) = Si P(Xi,Y) = P(¬X, Y) + P(X, Y)
– P(X) = Si P(X,Yi) = P(X, ¬Y) + P(X, Y)
7
Recap
• Goal: Probabilistic Inference– Calculate the conditional: P( Event | Observed Evidence)
• What do we need?– A model of the joint distribution– Read from the probability tables– Apply rules of probability theory:
¨ Sum rule (Disjoint events)¨ Product rule (Independent Event)¨ Chain rule¨ Bayes rule
8
Independence
Absolute (or, marginal) Independence
• X and Y are independent iff
– P(X,Y) = P(X) P(Y) [by definition]
– P(X | Y) = P(X)
Conditional Independence
• If X and Y are (conditionally) independent given Z, then
– P(X | Y, Z) = P(X | Z)
– Example:
¨ P(WetGrass | Season, Rain) = P(WetGrass | Rain)
Since P(X | Y) = P(X, Y)/P(Y) = P(X) P(Y)/P(Y)
9
Conditional Independence
• In practice, conditional independence is more common than absolute independence.– P(Final exam grade | Weather) ¹ P(Final exam grade)
¨ I.e., they are not independent– P(Final exam grade | Weather, Effort) = P(Final exam grade | Effort)
¨ But they are conditionally independent given Effort
• This leads to simplified rules for updating Bayes’ Rule, and then onward to Bayesian networks
F
E
W
P(F | W, E) = P(F | E)
10
Quiz time: Representing a joint Probability
• Joint probability: P(X1, X2, …, XN)– Defines the probability for any possible state of the world
– Let the variables be binary. How many numbers does it take todefine the joint distribution?
¨ Defined by 2N-1 independent numbers
• If the variables are independent, then
P(X1, X2, …, XN) = P(X1) P(X2) …P(XN) – How many numbers does it take to define the joint distribution?
¨ Just N independent numbers!
11
Tradeoffs in our model choices
Expressiveness
Space / computation efficiency
Fully IndependentP(X1, X2, …, XN)
= P(X1) P(X2) …P(XN)
O(N)<latexit sha1_base64="BH2x9RNnrmjKUPQvCy8QUM7xb10=">AAAB63icbZDLSgMxFIbP1Futt6pLN8Ei1E2ZkYIui25caQV7gXYomTTThiaZIckIZegruHGhiFtfyJ1vY6adhbb+EPj4zznknD+IOdPGdb+dwtr6xuZWcbu0s7u3f1A+PGrrKFGEtkjEI9UNsKacSdoyzHDajRXFIuC0E0xusnrniSrNIvlopjH1BR5JFjKCTWbdV+/OB+WKW3PnQqvg5VCBXM1B+as/jEgiqDSEY617nhsbP8 XKMMLprNRPNI0xmeAR7VmUWFDtp/NdZ+jMOkMURso+adDc/T2RYqH1VAS2U2Az1su1zPyv1ktMeOWnTMaJoZIsPgoTjkyEssPRkClKDJ9awEQxuysiY6wwMTaekg3BWz55FdoXNc/yQ73SuM7jKMIJnEIVPLiEBtxCE1pAYAzP8ApvjnBenHfnY9FacPKZY/gj5/MHCW6NkA==</latexit><latexit sha1_base64="BH2x9RNnrmjKUPQvCy8QUM7xb10=">AAAB63icbZDLSgMxFIbP1Futt6pLN8Ei1E2ZkYIui25caQV7gXYomTTThiaZIckIZegruHGhiFtfyJ1vY6adhbb+EPj4zznknD+IOdPGdb+dwtr6xuZWcbu0s7u3f1A+PGrrKFGEtkjEI9UNsKacSdoyzHDajRXFIuC0E0xusnrniSrNIvlopjH1BR5JFjKCTWbdV+/OB+WKW3PnQqvg5VCBXM1B+as/jEgiqDSEY617nhsbP8 XKMMLprNRPNI0xmeAR7VmUWFDtp/NdZ+jMOkMURso+adDc/T2RYqH1VAS2U2Az1su1zPyv1ktMeOWnTMaJoZIsPgoTjkyEssPRkClKDJ9awEQxuysiY6wwMTaekg3BWz55FdoXNc/yQ73SuM7jKMIJnEIVPLiEBtxCE1pAYAzP8ApvjnBenHfnY9FacPKZY/gj5/MHCW6NkA==</latexit><latexit sha1_base64="BH2x9RNnrmjKUPQvCy8QUM7xb10=">AAAB63icbZDLSgMxFIbP1Futt6pLN8Ei1E2ZkYIui25caQV7gXYomTTThiaZIckIZegruHGhiFtfyJ1vY6adhbb+EPj4zznknD+IOdPGdb+dwtr6xuZWcbu0s7u3f1A+PGrrKFGEtkjEI9UNsKacSdoyzHDajRXFIuC0E0xusnrniSrNIvlopjH1BR5JFjKCTWbdV+/OB+WKW3PnQqvg5VCBXM1B+as/jEgiqDSEY617nhsbP8 XKMMLprNRPNI0xmeAR7VmUWFDtp/NdZ+jMOkMURso+adDc/T2RYqH1VAS2U2Az1su1zPyv1ktMeOWnTMaJoZIsPgoTjkyEssPRkClKDJ9awEQxuysiY6wwMTaekg3BWz55FdoXNc/yQ73SuM7jKMIJnEIVPLiEBtxCE1pAYAzP8ApvjnBenHfnY9FacPKZY/gj5/MHCW6NkA==</latexit><latexit sha1_base64="BH2x9RNnrmjKUPQvCy8QUM7xb10=">AAAB63icbZDLSgMxFIbP1Futt6pLN8Ei1E2ZkYIui25caQV7gXYomTTThiaZIckIZegruHGhiFtfyJ1vY6adhbb+EPj4zznknD+IOdPGdb+dwtr6xuZWcbu0s7u3f1A+PGrrKFGEtkjEI9UNsKacSdoyzHDajRXFIuC0E0xusnrniSrNIvlopjH1BR5JFjKCTWbdV+/OB+WKW3PnQqvg5VCBXM1B+as/jEgiqDSEY617nhsbP8 XKMMLprNRPNI0xmeAR7VmUWFDtp/NdZ+jMOkMURso+adDc/T2RYqH1VAS2U2Az1su1zPyv1ktMeOWnTMaJoZIsPgoTjkyEssPRkClKDJ9awEQxuysiY6wwMTaekg3BWz55FdoXNc/yQ73SuM7jKMIJnEIVPLiEBtxCE1pAYAzP8ApvjnBenHfnY9FacPKZY/gj5/MHCW6NkA==</latexit>
O(eN )<latexit sha1_base64="/aIPhdmFR4TX04BMt/kMvmjmkj4=">AAAB7XicbZDLSgMxFIbP1Futt6pLN4NFqJsyI4Iui25caQV7gXYsmfRMG5tJhiQjlKHv4MaFIm59H3e+jellodUfAh//OYec84cJZ9p43peTW1peWV3Lrxc2Nre2d4q7ew0tU0WxTiWXqhUSjZwJrBtmOLYShSQOOTbD4eWk3nxEpZkUd2aUYBCTvmARo8RYq3FTxvvr426x5FW8qdy/4M+hBHPVusXPTk/SNEZhKCdat30vMUFGlGGU47jQSTUmhA5JH9sWBYlRB9l027F7ZJ2eG0llnzDu1P05kZFY61Ec2s6YmIFerE3M/2rt1ETnQcZEkhoUdPZRlHLXSHdyuttjCqnhIwuEKmZ3demAKEKNDahgQ/AXT/4LjZOKb/n2tFS9mMeRhwM4hDL4cAZVuIIa1IHCAzzBC7w60nl23pz3WWvOmc/swy85H999uI5n</latexit><latexit sha1_base64="/aIPhdmFR4TX04BMt/kMvmjmkj4=">AAAB7XicbZDLSgMxFIbP1Futt6pLN4NFqJsyI4Iui25caQV7gXYsmfRMG5tJhiQjlKHv4MaFIm59H3e+jellodUfAh//OYec84cJZ9p43peTW1peWV3Lrxc2Nre2d4q7ew0tU0WxTiWXqhUSjZwJrBtmOLYShSQOOTbD4eWk3nxEpZkUd2aUYBCTvmARo8RYq3FTxvvr426x5FW8qdy/4M+hBHPVusXPTk/SNEZhKCdat30vMUFGlGGU47jQSTUmhA5JH9sWBYlRB9l027F7ZJ2eG0llnzDu1P05kZFY61Ec2s6YmIFerE3M/2rt1ETnQcZEkhoUdPZRlHLXSHdyuttjCqnhIwuEKmZ3demAKEKNDahgQ/AXT/4LjZOKb/n2tFS9mMeRhwM4hDL4cAZVuIIa1IHCAzzBC7w60nl23pz3WWvOmc/swy85H999uI5n</latexit><latexit sha1_base64="/aIPhdmFR4TX04BMt/kMvmjmkj4=">AAAB7XicbZDLSgMxFIbP1Futt6pLN4NFqJsyI4Iui25caQV7gXYsmfRMG5tJhiQjlKHv4MaFIm59H3e+jellodUfAh//OYec84cJZ9p43peTW1peWV3Lrxc2Nre2d4q7ew0tU0WxTiWXqhUSjZwJrBtmOLYShSQOOTbD4eWk3nxEpZkUd2aUYBCTvmARo8RYq3FTxvvr426x5FW8qdy/4M+hBHPVusXPTk/SNEZhKCdat30vMUFGlGGU47jQSTUmhA5JH9sWBYlRB9l027F7ZJ2eG0llnzDu1P05kZFY61Ec2s6YmIFerE3M/2rt1ETnQcZEkhoUdPZRlHLXSHdyuttjCqnhIwuEKmZ3demAKEKNDahgQ/AXT/4LjZOKb/n2tFS9mMeRhwM4hDL4cAZVuIIa1IHCAzzBC7w60nl23pz3WWvOmc/swy85H999uI5n</latexit><latexit sha1_base64="/aIPhdmFR4TX04BMt/kMvmjmkj4=">AAAB7XicbZDLSgMxFIbP1Futt6pLN4NFqJsyI4Iui25caQV7gXYsmfRMG5tJhiQjlKHv4MaFIm59H3e+jellodUfAh//OYec84cJZ9p43peTW1peWV3Lrxc2Nre2d4q7ew0tU0WxTiWXqhUSjZwJrBtmOLYShSQOOTbD4eWk3nxEpZkUd2aUYBCTvmARo8RYq3FTxvvr426x5FW8qdy/4M+hBHPVusXPTk/SNEZhKCdat30vMUFGlGGU47jQSTUmhA5JH9sWBYlRB9l027F7ZJ2eG0llnzDu1P05kZFY61Ec2s6YmIFerE3M/2rt1ETnQcZEkhoUdPZRlHLXSHdyuttjCqnhIwuEKmZ3demAKEKNDahgQ/AXT/4LjZOKb/n2tFS9mMeRhwM4hDL4cAZVuIIa1IHCAzzBC7w60nl23pz3WWvOmc/swy85H999uI5n</latexit>
Fully generalP(X1, X2, …, XN)
Idea:1.Independent groups of variables?2.Are all dependency created equal?
12
The Chain Rule again
),...,(),...,|(),...,,( 22121 NNN XXPXXXPXXXP =
),...,( 2 NXXP ),...,(),...,|( 332 NN XXPXXXP=
),...,( 3 NXXP ),...,(),...,|( 443 NN XXPXXXP=
!
),( 1 NN XXP - )()|( 1 NNN XPXXP -=
)()|(),...,|(),...,|(),...,|(),...,,( 143322121 NNNNNNN XPXXPXXXPXXXPXXXPXXXP -= !
13
The Chain Rule again (cont.)
• Recursive definition:
P(X1,X2, …,XN) = P(X1| X2, …,XN) P(X2|X3, …,XN) … P(XN-1|XN) P(XN)
or equivalently
= P(X1) P(X2| X1) P(X3| X2,X1) … P(XN|XN-1, …,X1)2N - 1
1 2 4 2N-1
How many values needed to represent this?(assuming binary variables)
2N - 1 = 1 + 2 + 4 + … + 2N-1
14
Note on number of independent values….
• Random variables W, X, Y, and Z– W = { w1, w2, w3, w4}– X = { x1, x2}– Y = { y1, y2, y3}– Z = { z1, z2, z3, z4}
• How many (independent) numbers are needed to describe:– P(W)– P(X, Y)– P(W, X, Y, Z)– P(X | Y)– P(W | X, Y, Z)
4 - 1 = 32*3 – 1 = 54*2*3*4 – 1 = 95(2-1)*3 = 3(4-1)*(2*3*4) = 72
15
Benefit of conditional independence
• If some variables are conditionally independent, the joint probability can be specified with many fewer than 2N-1numbers (or 3N-1, or 10N-1, or…)
• For example: (for binary variables W, X, Y, Z)– P(W,X,Y,Z) = P(W) P(X|W) P(Y|W,X) P(Z|W,X,Y)
¨ 1 + 2 + 4 + 8 = 15 numbers to specify – But if Y and W are independent given X, and Z is independent of
W and X given Y, then¨ P(W,X,Y,Z) = P(W) P(X|W) P(Y|X) P(Z|Y)
– 1 + 2 + 2 + 2 = 7 numbers
• This is often the case in real problems.
Graphical models
16
Bayesian Networks / Belief NetworksMarkov Random Fields
What Are Graphical Models?
© Eric Xing @ CMU, 2005-2014 6
Graph Model
M
Data
D ´ fX(i)1 ;X
(i)2 ; :::;X
(i)m gN
i=1
Many applications!
17
Reasoning under uncertainty!
© Eric Xing @ CMU, 2005-2014
Speech recognition
Information retrieval
Computer vision
Robotic control
Planning
Games
Evolution
Pedigree
7
(Slides from Prof. Eric Xing)
Two ways to think about Graphical Models
• A particular factorization of a joint distribution
– P(X,Y,Z) = P(X) P(Y|X) P(Z|Y)
• A collection of conditional independences– { X ⟂ Y | Z, … }
18
Represented using a graph!
19
Belief Networks (a.k.a. Bayesian Networks)
a.k.a. Probabilistic networks, Belief nets, Bayes nets, etc.
• Belief network– A data structure (depicted as a graph) that represents the
dependence among variables and allows us to concisely specify the joint probability distribution
– The graph itself is known as an “influence diagram”
• A belief network is a directed acyclic graph where:– The nodes represent the set of random variables (one node per
random variable)– Arcs between nodes represent influence, or causality
¨ A link from node X to node Y means that X “directly influences” Y
– Each node has a conditional probability table (CPT) that definesP(node | parents)
20
Example
• Random variables X and Y– X – It is raining– Y – The grass is wet
• X has a causal effect on YOr, Y is a symptom of X
• Draw two nodes and link them
• Define the CPT for each node− P(X) and P(Y | X)
• Typical use: we observe Y and we want to query P(X | Y)− Y is an evidence variable− X is a query variable
X
Y
P(X)
P(Y|X)
21
Try it
• What is P(X | Y)?– Given that we know the CPTs of each
node in the graph X
Y
P(X)
P(Y|X))(
)()|()|(YP
XPXYPYXP =
å=
XYXPXPXYP),()()|(
å=
XXPXYPXPXYP)()|()()|(
22
Belief nets represent the joint probability
• The joint probability function can be calculated directly from the network– It’s the product of the CPTs of all the nodes
– P(var1, …, varN) = Πi P(vari|Parents(vari))
X
Y
P(X)
P(Y|X)
P(X,Y) = P(X) P(Y|X) P(X,Y,Z) = P(X) P(Y) P(Z|X,Y)
X
Z
Y P(Y)
P(Z|X,Y)
P(X)
23
Example
I’m at work and my neighbor John called to say my home alarm is ringing, but my neighbor Mary didn’t call. The alarm is sometimes triggered by minor earthquakes. Was there a burglar at my house?
• Random (boolean) variables:– JohnCalls, MaryCalls, Earthquake, Burglar, Alarm
• The belief net shows the causal links
• This defines the joint probability– P(JohnCalls, MaryCalls, Earthquake, Burglar, Alarm)
• What do we want to know? P(B | J, ¬M)
Why not P(B | J, A, ¬M) ?
24
Example
Links and CPTs?
25
Example
Joint probability? P(J, ¬M, A, B, ¬E)?
26
Calculate P(J, ¬M, A, B, ¬E)
Read the joint pf from the graph:P(J, M, A, B, E) = P(B) P(E) P(A|B,E) P(J|A) P(M|A)
Plug in the desired values:P(J, ¬M, A, B, ¬E) = P(B) P(¬E) P(A|B,¬E) P(J|A) P(¬M|A)
= 0.001 * 0.998 * 0.94 * 0.9 * 0.3= 0.0002532924
How about P(B | J, ¬M) ?
Remember, this means P(B=true | J=true, M=false)
27
Calculate P(B | J, ¬M)
),(),,(),|(
MJPMJBPMJBP
¬¬
=¬By marginalization:
ååååå
ååååå
¬
¬=
¬
¬=
kiikjikj
ji
jiijij
i
kkji
ji
jji
i
AMPAJPEBAPEPBP
AMPAJPEBAPEPBP
EBAMJP
EBAMJP
)|()|(),|()()(
)|()|(),|()()(
),,,,(
),,,,(
28
Example
• Conditional independence is seen here– P(JohnCalls | MaryCalls, Alarm, Earthquake, Burglary) =
P(JohnCalls | Alarm)
– So JohnCalls is independent of MaryCalls, Earthquake, and Burglary, given Alarm
• Does this mean that an earthquake or a burglary do not influence whether or not John calls?– No, but the influence is already accounted for in the Alarm
variable
– JohnCalls is conditionally independent of Earthquake, but not absolutely independent of it
29
Naive Bayes model
• A common situation is when a single cause directly influences several variables, which are all conditionally independent, given the cause.
e1
C
e2 e3
Rain
Wet grass People with umbrellas
Car accidents
P(C, e1, e2, e3) = P(C) P(e1 | C) P(e2 | C) P(e3 | C)
In general,
)|()(),...,,( 1 CePCPeeCPi
in Õ=
30
Naive Bayes model
• Typical query for naive Bayes:– Given some evidence, what’s the probability of the cause?– P(C | e1) = ?– P(C | e1, e3) = ?
e1
C
e2 e3
Rain
Wet grass People with umbrellas
Car accidents
)()()|()|(
1
11 eP
CPCePeCP =
å=
CCPCePCPCeP)()|()()|(
1
1
31
Drawing belief nets
• What would a belief net look like if all the variables were fully dependent?
• But this isn’t the only way to draw the belief net when all the variables are fully dependent
X1 X2 X3 X4 X5
P(X1,X2,X3,X4,X5) = P(X1)P(X2|X1)P(X3|X1,X2)P(X4|X1,X2,X3)P(X5|X1,X2,X3,X4)
32
Fully connected belief net
• In fact, there are N! ways of connecting up a fully-connected belief net – That is, there are N! ways of ordering the nodes
X1 X2 X1 X2 P(X1,X2) = ?
For N=2
For N=5
X1 X2 X3 X4 X5 P(X1,X2,X3,X4,X5) = ?
and 119 others…
33
Drawing belief nets (cont.)
Fully-connected net displays the joint distributionP(X1, X2, X3, X4, X5) = P(X1) P(X2|X1) P(X3|X1,X2) P(X4|X1,X2,X3) P(X5|X1, X2, X3, X4)
X1 X2 X3 X4 X5
But what if there are conditionally independent variables?P(X1, X2, X3, X4, X5) = P(X1) P(X2|X1) P(X3|X1,X2) P(X4|X2,X3) P(X5|X3, X4)
X1 X2 X3 X4 X5
34
Drawing belief nets (cont.)
What if the variables are all independent?P(X1, X2, X3, X4, X5) = P(X1) P(X2) P(X3) P(X4) P(X5)
X1 X2 X3 X4 X5
What if the links are drawn like this:
Not allowed – not a DAG
X1 X2 X3 X4 X5
35
Drawing belief nets (cont.)
What if the links are drawn like this:
X1 X2 X3 X4 X5
P(X1, X2, X3, X4, X5) = P(X1) P(X2 | X3) P(X3 | X1) P(X4 | X2, X3) P(X5 | X4)
It can be redrawn like this:
X1 X3 X2 X4 X5
All arrows going left-to-right