Bayesian Bayesian Network Network CVPR Winter seminar Jaemin Kim
Jan 15, 2016
Bayesian Network Bayesian Network
CVPR Winter seminar
Jaemin Kim
OutlineOutline» Concepts in Probability
• Probability• Random variables• Basic properties (Bayes rule)
• Bayesian Networks• Inference• Decision making• Learning networks from data• Reasoning over time• Applications
2
ProbabilitiesProbabilities
Probability distribution P(X| • X is a random variable
• Discrete• Continuous
• is background state of information
3
Discrete Random VariablesDiscrete Random Variables
Finite set of possible outcomes
0)( ixP
1)(1
n
iixP
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
X1 X2 X3 X41)()( xPxPX binary:
nxxxxX ,...,,, 321
4
Continuous Random VariablesContinuous Random Variables
Probability distribution (density function) over continuous values
10
0
1)( dxxP
10,0X 0)( xP
7
5
)()75( dxxPxP
)(xP
x5 7
5
More ProbabilitiesMore Probabilities
Joint
• Probability that both X=x and Y=y Conditional
• Probability that X=x given we know that Y=y
)(),( yYxXPyxP
)|()|( yYxXPyxP
6
Rule of ProbabilitiesRule of Probabilities
Product Rule
Marginalization
)()|()()|(),( XPXYPYPYXPYXP
),(),()( xYPxYPYP
),( )(1
n
iixYPYP
X binary:
7
Bayes RuleBayes Rule
)()|()()|(),( HPHEPEPEHPEHP
)(
)()|()|(
EP
HPHEPEHP
8
목적 : 특정 variable 에 관한 정보 (probability distribution) 를
상관관계가 있는 다른 variables 관한 정보부터 추출
Graph ModelGraph Model
Definition:• A collection of variables (nodes) with a set of
dependencies (edges) between the variables, and a set of probability distribution functions for each
variable
• A Bayesian network is a special type of graph model
which is a directed acyclic graph (DAG)
9
Bayesian NetworksBayesian Networks
A Graph− nodes represent the random variables
− directed edges (arrows) between pairs of nodes
− it must be a Directed Acyclic Graph (DAG)
− the graph represents relationships between variables
Conditional probability specifications− the conditional probability distribution (CPD) of each variable
given its parents
− discrete variable: table (CPT)
10
Bayesian Networks (Belief Bayesian Networks (Belief Networks)Networks) A Graph
− directed edges (arrows) between pairs of nodes
− causality: A “causes” B
− AI an statistics communities
Markov Random fields (MRF)Markov Random fields (MRF)
A Graph
− undirected edges (arrows) between pairs of nodes
− a simple definition of independence:
If all paths between the nodes in A and B are separated by a node c
A and B are conditionally independent given a third set C
− physics and vision communities
11
Bayesian NetworksBayesian Networks
12
Bayesian networksBayesian networks
Basics• Structured representation
• Conditional independence
• Naïve Bayes model
• Independence facts
13
Bayesian networksBayesian networks
CancerSmoking heavylightnoS ,,
malignantbenignnoneC ,,P(S=no) 0.80P(S=light) 0.15P(S=heavy) 0.05
Smoking= no light heavyP(C=none) 0.96 0.88 0.60P(C=benign) 0.03 0.08 0.25P(C=malig) 0.01 0.04 0.15
P(C|S):
P(S):
14
Product RuleProduct Rule
P(C,S) = P(C|S) P(S)
S C none benign malignantno 0.768 0.024 0.008
light 0.132 0.012 0.006
heavy 0.035 0.010 0.005
P(C=none ^ S=no) = P(C=none | S=no)P(S=no) = 0.96*0.8 = 0.768
15
Product RuleProduct Rule
P(C,S) = P(C|S) P(S)
S C none benign malignantno 0.768 0.024 0.008
light 0.132 0.012 0.006
heavy 0.035 0.010 0.005
P(C=none ^ S=no) = P(C=none | S=no)P(S=no) = 0.96*0.8 = 0.768
16
MarginalizationMarginalization
P(C=mal) = P(C=mal ^ S=no) + P(C=mal ^ S=light) + P(C=mal | S=heavy)
S C none benign malig totalno 0.768 0.024 0.008 .80
light 0.132 0.012 0.006 .15
heavy 0.035 0.010 0.005 .05
total 0.935 0.046 0.019
P(Cancer)
P(Smoke)
P(S=no) = P(S=no ^ C=no) + P(S=no ^ C=be) + P(S=no & C=mal)
17
Bayes Rule RevisitedBayes Rule Revisited
)(
),(
)(
)()|()|(
CP
SCP
CP
SPSCPCSP
S C none benign maligno 0.768/.935 0.024/.046 0.008/.019
light 0.132/.935 0.012/.046 0.006/.019
heavy 0.030/.935 0.015/.046 0.005/.019
Cancer= none benign malignantP( S=no) 0.821 0.522 0.421P( S=light) 0.141 0.261 0.316P(S=heavy) 0.037 0.217 0.263
P(S|C):
18
A Bayesian NetworkA Bayesian Network
Smoking
GenderAge
Cancer
LungTumor
SerumCalcium
Exposureto Toxics
19
Problems with Large InstancesProblems with Large Instances
lb
lxfsbPxfsP,
),,,,(),,(111111
• The joint probability distribution, P(A,G,E,S,C,L,SC)
For five binary variables there are 27 = 128 values in the joint distribution (for 100 variables there are over 1030 values)
How are these values to be obtained?
• Inference
To obtain posterior distributions once some evidence is available requires summation over an exponential number of terms eg 22 in the calculation of
which increases to 297 if there are 100 variables.
IndependenceIndependence
Age and Gender are independent.
P(A|G) = P(A) A G P(G|A) = P(G) G A
GenderAge
P(A,G) = P(G|A) P(A) = P(G)P(A)P(A,G) = P(A|G) P(G) = P(A)P(G)
P(A,G) = P(G)P(A)
21
Conditional IndependenceConditional Independence
Smoking
GenderAge
Cancer
Cancer is independent of Age and Gender given Smoking.
P(C|A,G,S) = P(C|S) C A,G | S
(Smoking=heavy) 조건은 Age 와 Gender 의 확률분포를 제한 (Smoking=heavy) 조건은 cancer 의 확률분포를 제한(Smoking=heavy) 조건하에서 cancer 는 age 와 gender 에 독립
22
More Conditional Independence:More Conditional Independence:Naïve Bayes Naïve Bayes
Cancer
LungTumor
SerumCalcium
Serum Calcium is independent of Lung Tumor, given Cancer
P(L|SC,C) = P(L|C)
Serum Calcium and Lung Tumor are dependent
혈청23
More Conditional Independence:More Conditional Independence:Explaining Away Explaining Away
Exposure to Toxics is dependent on Smoking, given Cancer
Exposure to Toxics and Smoking are independent
Smoking
Cancer
Exposureto Toxics
E S
P(E = heavy | C = malignant) >
P(E = heavy | C = malignant, S=heavy)
24
More Conditional Independence:More Conditional Independence:Explaining Away Explaining Away
Exposure to Toxics is dependent on Smoking, given Cancer
Smoking
Cancer
Exposureto Toxics
25
Smoking
Cancer
Exposureto Toxics
Moralize the graph.
Put it all togetherPut it all together
),,,,,,( SCLCSEGAP
Smoking
GenderAge
Cancer
LungTumor
SerumCalcium
Exposureto Toxics
)|()|( CLPCSCP
)()( GPAP
),|()|( GASPAEP
),|( SECP
26
General Product (Chain) Rule General Product (Chain) Rule for Bayesian Networksfor Bayesian Networks
)|(),,,(1
21 iPa
n
iin XPXXXP
Pai=parents(Xi)
27
Conditional IndependenceConditional Independence
28
Smoking
GenderAge
Cancer
LungTumor
SerumCalcium
Exposureto Toxics Cancer is independent of
Age and Gender given Exposure to Toxics and Smoking.
Descendants
Parents
Non-Descendants
A variable (node) is conditionally independent of its non-descendants given its parents.
Another non-descendant Another non-descendant
Diet
Cancer is independent of Diet given Exposure to Toxics and Smoking.
Smoking
GenderAge
Cancer
LungTumor
SerumCalcium
Exposureto Toxics
29
n
iiin xpaxPxxxP
121 ))(|(),...,,(
Representing the Joint DistributionRepresenting the Joint Distribution
In general, for a network with nodes X1, X2, …, Xn then
An enormous saving can be made regarding the number of values required for the joint distribution.
To determine the joint distribution directly for n binary variables 2n – 1 values are required.
For a BN with n binary variables and each node has at most k parents then less than 2kn values are required.
An ExampleAn Example
Smoking history
Fatigue
Bronchitis Lung Cancer
X-ray
P(s1)=0.2
P(l1|s1)=0.003P(l1|s2)=0.00005
P(b1|s1)=0.25P(b1|s2)=0.05
P(f1|b1,l1)=0.75P(f1|b1,l2)=0.10P(f1|b2,l1)=0.5P(f1|b2,l2)=0.05
P(x1|l1)=0.6P(x1|l2)=0.02
?),,,,(11121
xflbsP
SolutionSolution
)|(),|()|()|()(),,,,( lxPlbfPslPsbPsPxflbsP
),,,|(),,|(),|()|()(),,,,( flsbxPlsbfPsblPsbPsPxflbsP
)|(),,,|( lxPflsbxP
Note that our joint distribution with 5 variables can be represented as:
Consequently the joint probability distribution can now be expressed as
For example, the probability that someone has a smoking history, lung cancer but not bronchitis, suffers from fatigue and tests positive in an X-ray test is
000135.06.05.0003.075.02.0),,,,(11121
xflbsP
Independence and Graph SeparationIndependence and Graph Separation
• Given a set of observations, is one set of variables dependent on another set?
• Observing effects can induce dependencies.
• d-separation (Pearl 1988) allows us to check conditional independence graphically.
Bayesian networksBayesian networks
• Additional structure• Nodes as functions• Causal independence• Context specific dependencies• Continuous variables• Hierarchy and model construction
Nodes as funtionsNodes as funtions
X
A
B
0.1
0.3
0.6
a b a b a b
0.4
0.2
0.4
a b
0.5
0.3
0.2
lo
med
hi
0.7
0.1
0.2
X
0.7
0.1
0.2
• A BN node is conditional distribution function• its parent values are the inputs• its output is a distribution over its valueslo : 0.7
med : 0.1
hi : 0.2
b
a
Nodes as funtionsNodes as funtions
X
A
B
X
Any type of functionfrom Val(A,B)to distributions
over Val(X)
lo : 0.7
med : 0.1
hi : 0.2
b
a
Continuous variablesContinuous variables
OutdoorTemperature
A/C Setting
97o hi
IndoorTemperature
Function from Val(A,B)to density functions
over Val(X)
P(x)
x
IndoorTemperature
Gaussian (normal) distributionsGaussian (normal) distributions
2
)(exp
2
1)(
2xxP
N(, )
different mean different variance
Gaussian networksGaussian networks
X Y
),(~ 2XNX
),(~ 2YbaxNY
X YX Y
Each variable is a linear function of its parents,
with Gaussian noise
Joint probability density functions:
Composing functions Composing functions
• Recall: a BN node is a function• We can compose functions to get more
complex functions.• The result: A hierarchically structured
BN. • Since functions can be called more than
once, we can reuse a BN model fragment in multiple contexts.
Tires
Owner
Car:
Mileage
Maintenance Age Original-value
Fuel-efficiency Braking-power
OwnerAge Income
BrakesBrakes: Power
Tires:RF-TireLF-Tire
TractionPressure
EngineEngineEngine:Power
Bayesian NetworksBayesian Networks
• Knowledge acquisition• Variables• Structure• Numbers
What is a variable?What is a variable?
Risk of Smoking Smoking
Values versus Probabilities
• Collectively exhaustive, mutually exclusive values
4321 xxxx
jixx ji )(
Error Occured
No Error
Clarity Test: Clarity Test: Knowable in PrincipleKnowable in Principle
• Weather {Sunny, Cloudy, Rain, Snow}• Gasoline: Cents per gallon• Temperature { 100F , < 100F}• User needs help on Excel Charting {Yes,
No}• User’s personality {dominant, submissive}
StructuringStructuring
LungTumor
SmokingExposureto Toxic
GenderAge
Extending the conversation.
Network structure correspondingto “causality” is usually good.
CancerGeneticDamage
Course ContentsCourse Contents
• Concepts in Probability• Bayesian Networks» Inference• Decision making• Learning networks from data• Reasoning over time• Applications
InferenceInference
• Patterns of reasoning• Basic inference• Exact inference• Exploiting structure• Approximate inference
Predictive InferencePredictive Inference
How likely are elderly malesto get malignant cancer?
P(C=malignant | Age>60, Gender= male)
Smoking
GenderAge
Cancer
LungTumor
SerumCalcium
Exposureto Toxics
CombinedCombined
How likely is an elderly male patient with high Serum Calcium to have malignant cancer?
P(C=malignant | Age>60, Gender= male, Serum Calcium = high)
Smoking
GenderAge
Cancer
LungTumor
SerumCalcium
Exposureto Toxics
Explaining awayExplaining away
Smoking
GenderAge
Cancer
LungTumor
SerumCalcium
Exposureto Toxics
• If we see a lung tumor, the probability of heavy smoking and of exposure to toxics both go up.
If we then observe heavy smoking, the probability of exposure to toxics goes back down.
Smoking
Inference in Belief NetworksInference in Belief Networks
• Find P(Q=q|E= e)• Q the query variable• E set of evidence variables
P(q | e) =P(q, e)
P(e)
X1,…, Xn are network variables except Q, E
P(q, e) = P(q, e, x1,…, xn) x1,…, xn
Basic InferenceBasic Inference
A B
= P(c | b) P(b | a) P(a) b a
P(b)
P(c) = P(a, b, c)b,a
P(b) = P(a, b) = P(b | a) P(a) a a
C
bP(c) = P(c | b) P(b)
b,a= P(c | b) P(b | a) P(a)
Inference in treesInference in trees
X
Y1Y2
P(x) = P(x | y1, y2) P(y1, y2)y1, y2
because of independence of Y1, Y2:
y1, y2
= P(x | y1, y2) P(y1) P(y2)
X
PolytreesPolytrees
• A network is singly connected (a polytree) if it contains no undirected loops.
Theorem: Inference in a singly connected network can be done in linear time*.
Main idea: in variable elimination, need only maintain distributions over single nodes.
* in network size including table sizes.
The problem with loopsThe problem with loops
Rain
Cloudy
Grass-wet
Sprinkler
P(c) 0.5
P(r)
c c
0.99 0.01P(s)
c c
0.01 0.99
deterministic or
The grass is dry only if no rain and no sprinklers.
P(g) = P(r, s) ~ 0
The problem with loops contdThe problem with loops contd..
= P(r, s)
P(g | r, s) P(r, s) + P(g | r, s) P(r, s)
+ P(g | r, s) P(r, s) + P(g | r, s) P(r, s)
0
10
0
= P(r) P(s) ~ 0.5 ·0.5 = 0.25
problem
~ 0
P(g) =
Variable eliminationVariable elimination
A
B
C
P(c) = P(c | b) P(b | a) P(a) b a
P(b)
x
P(A) P(B | A)
P(B, A) A P(B)
x
P(C | B)
P(C, B) B P(C)
Inference as variable elimination Inference as variable elimination
• A factor over X is a function from val(X) to numbers in [0,1]:• A CPT is a factor• A joint distribution is also a factor
• BN inference:• factors are multiplied to give new ones• variables in factors summed out
• A variable can be summed out as soon as all factors mentioning it have been multiplied.
Variable Elimination with loopsVariable Elimination with loops
Smoking
GenderAge
Cancer
LungTumor
SerumCalcium
Exposureto Toxics
x
P(A,G,S)
P(A) P(S | A,G)P(G)
P(A,S)G
E,S
P(C)
P(L | C) x P(C,L) C
P(L)
P(E,S)A
P(A,E,S)
P(E | A)
x
P(C | E,S)
P(E,S,C)
x
Complexity is exponential in the size of the factors
Inference in BNs and Junction Tree
The main point of BNs is to enable probabilistic inference to be performed. Inference is the task of computing the probability of each value of a node in BNs when other variables’ values are know.
The general idea is doing inference by representing the joint probability distribution on an undirected graph called the Junction tree
The junction tree has the following characteristics:
• it is an undirected tree, its nodes are clusters of variables
• given two clusters, C1 and C2, every node on the path between them contains their intersection C1 C2
• a Separator, S, is associated with each edge and contains the variables in the intersection between neighbouring nodes
ABC BCD CDEBC CD
Moralize the Bayesian network Triangulate the moralized graph Let the cliques of the triangulated graph be the
nodes of a tree, and construct the junction tree Belief propagation throughout the junction tree
to do inference
Inference in BNsInference in BNs
Constructing the Junction Tree (1)Constructing the Junction Tree (1)
Step 1. Form the moral graph from the DAG
Consider BN in our example
DAG Moral Graph – marry parents and remove arrows
S
F
B L
X
S
F
B L
X
Constructing the Junction Tree (2)Constructing the Junction Tree (2)
Step 2. Triangulate the moral graph
An undirected graph is triangulated if every cycle of length greater than 3 possesses a chord
S
F
B L
X
Constructing the Junction Tree (3)Constructing the Junction Tree (3)
Step 3. Identify the Cliques
A clique is a subset of nodes which is complete (i.e. there is an edge between every pair of nodes) and maximal.
Cliques
{B,S,L}{B,L,F}{L,X}
S
F
B L
X
Constructing the Junction Tree (4)Constructing the Junction Tree (4)
Step 4. Build Junction Tree
The cliques should be ordered (C1,C2,…,Ck) so they possess the running intersection property: for all 1 < j ≤ k, there is an i < j such that Cj (C1… Cj-1) Ci.
To build the junction tree choose one such I for each j and add an edge between Cj and Ci.
BSL
BLF
LX
Junction Tree
Cliques
{B,S,L}{B,L,F}{L,X}
BL
L
)|(
),|(
)()|()|(
lxP
lbfP
sPsbPsbP
LX
SLF
BSL
Potentials InitializationPotentials Initialization
To initialize the potential functions:
1. set all potentials to unity
2. for each variable, Xi, select one node in the junction tree (i.e. one clique) containing both that variable and its parents, pa(Xi), in the original DAG
3. multiply the potential by P(xi|pa(xi))
BSL
BLF
LX
BL
L
Potential RepresentationPotential Representation
Ss ss
Cc cc
x
xxP
)(
)()(
)()( ccc xPx
The joint probability distribution can now be represented in terms of potential functions, ϕ, defined on each clique and each separator of the junction tree. The joint distribution is given by
The idea is to transform one representation of the joint distribution to another in which for each clique, C, the potential function gives the marginal distribution for the variables in C, i.e.
This will also apply for the separators, S.
Given a numbered graph, proceed from node n, decrease to 1• Determine the lower-numbered nodes
which are adjacent to the current node, including those which may have been made adjacent to this node earlier in this algorithm
• Connects these nodes to each other.
TriangulationTriangulation
Numbering the nodes• Arbitrarily number the nodes
• Maximum cardinality search
• Give any node a value of 1
• For each subsequent number, pick an new unnumbered node that neighbors the most already numbered nodes
TriangulationTriangulation
TriangulationTriangulation
BN Moralized graph
TriangulationTriangulation
6
7
8
2
5
4
1
3
6
7
8
2
5
4
1
3
Arbitrary numbering
TriangulationTriangulation
6 7 8
2 5
4 13
6 7 8
2 5
4 13
Maximum cardinality search
Course ContentsCourse Contents
Concepts in Probability Bayesian Networks Inference» Decision making Learning networks from data Reasoning over time Applications
Decision makingDecision making
Decision - an irrevocable allocation of domain resources
Decision should be made so as to maximize expected utility.
View decision making in terms of• Beliefs/Uncertainties
• Alternatives/Decisions
• Objectives/Utilities
Course ContentsCourse Contents
Concepts in Probability Bayesian Networks Inference Decision making» Learning networks from data Reasoning over time Applications
Learning networks from dataLearning networks from data
• The learning task• Parameter learning
• Fully observable• Partially observable
• Structure learning• Hidden variables
The learning taskThe learning task
B E A C N
...
Input: training data
Call
Alarm
Burglary Earthquake
Newscast
Output: BN modeling data
• Input: fully or partially observable data cases?
• Output: parameters or also structure?
e a cb n
b e a c n
Parameter learning: one variableParameter learning: one variable
Different coin tosses independent given P(X1, …, Xn | ) =
h heads, t tails
Unfamiliar coin: Let = bias of coin (long-run fraction of heads)
If known (given), then P(X = heads | ) =
h (1-)t
Maximum likelihoodMaximum likelihood
79
= hh+t
Input: a set of previous coin tosses• X1, …, Xn = {H, T, H, H, H, T, T, H, . . ., H}
h heads, t tails Goal: estimate
The likelihood P(X1, …, Xn | ) = h (1-)t
The maximum likelihood solution is:
Conditioning on dataConditioning on data
P()D
h heads, t tails
P( | D)
1 head1 tail
P() P(D | ) = P() h (1-)t
Conditioning on dataConditioning on data
Good parameter distribution:
* Dirichlet distribution generalizes Beta to non-binary variables.
General parameter learningGeneral parameter learning
A multi-variable BN is composed of several independent parameters (“coins”).
A BA, B|a, B|a
Can use same techniques as one-variable case to learn each one separately
Three parameters:
Max likelihood estimate of B|a would be:
#data cases with b, a#data cases with a
B|a =
Partially observable dataPartially observable data
B E A C N
... Call
Alarm
Burglary Earthquake
Newscast
? a cb ?
b ? a ? n
• Fill in missing data with “expected” value• expected = distribution over possible values• use “best guess” BN to estimate distribution
IntuitionIntuition
In fully observable case:
Problem: * unknown.
n|e =
#data cases with n, e#data cases with e
j I(n,e | dj)
j I(e | dj)
I(e | dj) =1 if E=e in data case dj
0 otherwise
=
In partially observable case I is unknown.
Best estimate for I is: )|,()|,(ˆ * jj denPdenI
Expectation Maximization (EM)Expectation Maximization (EM)
Expectation (E) step • Use current parameters to estimate filled in
data.
Maximization (M) step Use filled in data to do max likelihood estimation
)|,()|,(ˆ jj denPdenI
j j
j j
endeI
denI
)|(ˆ
)|,(ˆ~|
Repeat :
until convergence.
Set: ~:
Structure learningStructure learning
Goal: find “good” BN structure (relative to data)
Solution: do heuristic search over space of network structures.
Search spaceSearch space
Space = network structuresOperators = add/reverse/delete edges
Heuristic searchHeuristic search
score
Use scoring function to do heuristic search (any algorithm).Greedy hill-climbing with randomness works pretty well.
ScoringScoring
Fill in parameters using previous techniques & score completed networks.
One possibility for score:
likelihood function: Score(B) = P(data | B)
Example: X, Y independent coin tosses typical data = (27 h-h, 22 h-t, 25 t-h, 26 t-t)
Maximum likelihood network structure:
X Y
Max. likelihood network typically fully connected
This is not surprising: maximum likelihood always overfits…
Better scoring functionsBetter scoring functions
MDL formulation: balance fit to data and model complexity (# of parameters)
Score(B) = P(data | B) - model complexity
* with Dirichlet parameter prior, MDL is an approximation to full Bayesian score.
Full Bayesian formulation prior on network structures & parameters more parameters higher dimensional space get balance effect as a byproduct*
Hidden variablesHidden variables
There may be interesting variables that we never get to observe:• topic of a document in information retrieval;
• user’s current task in online help system. Our learning algorithm should
• hypothesize the existence of such variables;
• learn an appropriate state space for them.
E3
E1
E2
randomlyscattered data
E3
E1
E2
actual data
Bayesian clustering (Autoclass)Bayesian clustering (Autoclass)
(hypothetical) class variable never observed if we know that there are k classes, just run
EM learned classes = clusters Bayesian analysis allows us to choose k, trade
off fit to data with model complexity
naïve Bayes model: Class
E1 E2 En…...
E3
E1
E2
Resulting clusterdistributions
Detecting hidden variablesDetecting hidden variables
Unexpected correlations hidden variables.
Cholesterolemia
Test1 Test2 Test3
Hypothesized modelHypothesized model
Cholesterolemia
Test1 Test2 Test3
Data modelData model
Cholesterolemia
Test1 Test2 Test3
““Correct” modelCorrect” modelHypothyroid
Course ContentsCourse Contents
Concepts in Probability Bayesian Networks Inference Decision making Learning networks from data» Reasoning over time Applications
Reasoning over timeReasoning over time
Dynamic Bayesian networks Hidden Markov models Decision-theoretic planning
• Markov decision problems
• Structured representation of actions
• The qualification problem & the frame problem
• Causality (and the frame problem revisited)
Dynamic environmentsDynamic environments
State(t) State(t+1) State(t+2)
Markov property: past independent of future given current state; a conditional independence assumption; implied by fact that there are no arcs t t+2.
Dynamic Bayesian networksDynamic Bayesian networks
...
State described via random variables.
Velocity(t+1)
Position(t+1)
Weather(t+1)
Drunk(t+1)
Velocity(t)
Position(t)
Weather(t)
Drunk(t)
Velocity(t+2)
Position(t+2)
Weather(t+2)
Drunk(t+2)
Hidden Markov modelHidden Markov model
State transitionmodel
Observationmodel
State(t) State(t+1)
Obs(t) Obs(t+1)
An HMM is a simple model for a partially observable stochastic domain.
Hidden Markov modelHidden Markov model
Speech recognition:• states = phonemes• observations = acoustic
signal Biological sequencing:
• states = protein structure• observations = amino acids
0.8
0.15
0.05
Partially observable stochastic environment:
Mobile robots: states = location observations = sensor input
Acting under uncertaintyActing under uncertainty
agentobserves
state
Overall utility = sum of momentary rewards. Allows rich preference model, e.g.:
rewards correspondingto “get to goal asap” = +100 goal states
-1 other states
action model
Action(t)
Markov Decision Problem (MDP)
State(t+2)
Action(t+1)
Reward(t+1)Reward(t)
State(t) State(t+1)
Partially observable MDPsPartially observable MDPs
State(t+2)State(t) State(t+1)
Action(t) Action(t+1)
Reward(t+1)Reward(t)
The optimal action at time t depends on the entire history of previous observations.
Instead, a distribution over State(t) suffices.
agent observesObs, not state
Obs(t) Obs(t+1)Obs depends
on state
Structured representationStructured representation
Position(t)
Holding(t)
Direction(t)Preconditions Effects
Probabilistic action model• allows for exceptions & qualifications;• persistence arcs: a solution to the frame problem.
Position(t+1)
Holding(t+1)
Direction(t+1)Move:
Position(t)
Holding(t)
Direction(t)
Position(t+1)
Holding(t+1)
Direction(t+1)Turn:
ApplicationsApplications
Medical expert systems• Pathfinder• Parenting MSN
Fault diagnosis• Ricoh FIXIT • Decision-theoretic troubleshooting
Vista Collaborative filtering