Bayesian Network Chao Lan
Bayesian Network
Chao Lan
Assume Markov chain on observations (strong in reality).
Recap: Markov Model
Assume each observation is generated from one latent discrete variable.
Assume Markov chain on latent variables (weaker in reality).
Recap: Hidden Markov Model
Model joint probability of both observed and latent variables.
Recap: Probability Factorization in HMM
Recap: Applications of HMM
TodayIntroduction to Bayesian Network
Probability Factorization based on BN
Various Forms of BN
Sampling based on BN
Conditional Independence (D-separation)
Represent variable dependency in a graph!
- each node is a variable
- each link specifies a dependency between its connected nodes
- link can be directed or undirected
- directed link identifies parent & child node
Example: [1] c depends on (a, b) [2] a has no dependence on anyone [3] b depends on a
Probabilistic Graphical Model
Visualize the structure of a complex probabilistic model.
Assist design and motivate new models.
Provide insights into the probabilistic model (conditional independence).
Why Probabilistic Graphical Model?
Bayesian Network (directed graphical model): all links have directions.
Directed Acyclic Graph (DAG): directed graph with no directed cycles.
Special Probabilistic Graphical Models
Application of Bayesian NetworkDiagnosis of a patient can be {tuberculosis, lung cancer, bronchitis}
This example is from the lecture slides of “some applications of Bayesian networks” by Jiri Vomlel.
Application of Bayesian Network
Know nothing about the patient. Know the patient smokes.
… and complains about dyspnoea
Application of Bayesian Network
smoke
… and his X-ray is positive
Application of Bayesian Network
smoke + dyspnoea
… and his visited Asia recently
Application of Bayesian Network
smokes + dyspnoea + positive X-ray
Let a, b, c be three random variables whose
dependencies are specified by the right graph.
How can we factorize p(a,b,c) ?
Factorization based on Bayesian Network
Let a, b, c be three random variables whose
dependencies are specified by the right graph.
How can we factorize p(a,b,c) ?
p(a,b,c) = p(a) * p(b|a) * p(c|a,b)
Factorization based on Bayesian Network
Let a, b, c be three random variables whose
dependencies are specified by the right graph.
How can we factorize p(a,b,c) ?
p(a,b,c) = p(a) * p(b|a) * p(c|a,b)
A general factorization form
Factorization based on Bayesian Network
Exercise
Factorize the following joint probability based on the right Bayesian network.
p(x1, x2, x3, x4, x5, x6, x7) =
Factorize the following joint probability based on the right Bayesian network.
p(x1, x2, x3, x4, x5, x6, x7) = p(x1) * p(x2) * p(x3)
* p(x4 | x1,x2,x3)
* p(x5 | x1,x3)
* p(x6 | x4)
* p(x7 | x4,x5)
Exercise
A set of observations x1, x2, …, xn
A model with unknown parameter w.
Model input is an observed xi, output is its unknown label ti.
Assume p(ti|xi,w,σ) ~ N(w*xi,σ2).
Assume p(w) ~ N(0,α-1)
Bayesian Networks: Various Forms
wxi ti
Example
- xi is sensor signal of individual i
- ti is activity of individual i
- activity prediction model
Bayesian Networks: Various Forms
We are mainly interested in the joint distribution of unknown parameters t and w (t=[t1,...,tn])
A set of observations x1, x2, …, xn
A model with unknown parameter w.
Model input is an observed xi, output is its unknown label ti.
Assume p(ti|xi,w,σ) ~ N(w*xi,σ2).
Assume p(w) ~ N(0,α-1)
wxi ti
We can use a more compact graph
Bayesian Networks: Various Forms
A set of observations x1, x2, …, xn
A model with unknown parameter w.
Model input is an observed xi, output is its unknown label ti.
Assume p(ti|xi,w,σ) ~ N(w*xi,σ2).
Assume p(w) ~ N(0,α-1)
wxi ti
Now add other (deterministic) parameters
Bayesian Networks: Various Forms
A set of observations x1, x2, …, xn
A model with unknown parameter w.
Model input is an observed xi, output is its unknown label ti.
Assume p(ti|xi,w,σ) ~ N(w*xi,σ2).
Assume p(w) ~ N(0,α-1)
wxi ti
Since t is observed in learning, make it solid.
Bayesian Networks: Various Forms
A set of observations x1, x2, …, xn
A model with unknown parameter w.
Model input is an observed xi, output is its unknown label ti.
Assume p(ti|xi,w,σ) ~ N(w*xi,σ2).
Assume p(w) ~ N(0,α-1)
wxi ti
Given a new observation ,predict its label .
Bayesian Networks: Various Forms
A set of observations x1, x2, …, xn
A model with unknown parameter w.
Model input is an observed xi, output is its unknown label ti.
Assume p(ti|xi,w,σ) ~ N(w*xi,σ2).
Assume p(w) ~ N(0,α-1)
wxi ti
Sampling based on Bayesian Network
Sampling is the process of drawing examples from a probability distribution.
Suppose outcome of a coin flip X~Be(0.4). We can sample X by actually flipping the coin. The outcome should be T with 60% chance, and H with 40%.
- x1 = H (1st example)
- x2 = T (2nd example)
- x3 = T (3rd example)
- x4 = ...
Sampling based on Bayesian Network
The graph can guide data sampling process, e.g.,
To sample a point from p(x4|pa4)
- first identify parent set pa4 = {x1, x2, x3}
- x1 has no parent, sample a point s1 from p(x1)
- x2 has no parent, sample a point s2 from p(x2)
- x3 has no parent, sample a point s3 from p(x3)
- sample a point from p(x4|x1=s1,x2=s2,x3=s3)
Exercise
How to sample a point from p(x7|pa7)?
Sampling based on Bayesian Network
To sample a (pair of) point from joint distribution p(x2,x4)?
Sampling based on Bayesian Network
To sample a (pair of) point from joint distribution p(x2,x4)
- sample a full joint distribution from p(x1,...,x7)
- keep x2 and x4 and discard the rest
Conditional Independence in BN
The Bayesian network can tell us whether two variables are conditionally independent or not.
Two random variables x, y are independent conditioned on z if
p(x,y|z) = p(x|z) p(y|z)
Conditional Independence in BN
The Bayesian network can tell us whether two variables are conditionally independent or not.
Two random variables x, y are independent conditioned on z if
p(x,y|z) = p(x|z) p(y|z)
The right graph instructs the factorization
p(a, b | c) = p(a | c) p(b | c)
Conditional Independence in BN
The Bayesian network can tell us whether two variables are conditionally independent or not.
Two random variables x, y are independent conditioned on z if
p(x,y|z) = p(x|z) p(y|z)
The right graph instructs the factorization
p(a, b | c) = p(a | c) p(b | c)
This implies
Conditional Independence in BN
The Bayesian network can tell us whether two variables are conditionally independent or not.
Two random variables x, y are independent conditioned on z if
p(x,y|z) = p(x|z) p(y|z)
The right graph instructs the factorization
p(a, b | c) = p(a | c) p(b | c)
This implies
D-Separation
Sometimes it is not obvious whether two variables are conditionally independent. In this case we can use d-separation criterion.
D-Separation
Rule 1 (Unconditional Separation)
- x and y are d-connected if there is an unblocked path between them
Rule 2 (Blocking by Conditioning)
- x and y are d-connected, conditioned on a set Z of nodes, if there is a collider-free path between x and y that traverses no member of Z.
Rule 3 (Conditioning on Colliders)
- If a collider is a member of the conditioning set Z, or has a descendant in Z, then it no longer blocks any path that traces this collider.
Rule 1: Unconditional Separation
Rule 1: x and y are d-connected if there is an unblocked path between them.
- a path is a consecutive sequence of undirected links
- a path is unblocked if it contains no collider (a node where arrows meet head-to-head)
- otherwise (all paths between x, y are blocked), we say x and y are d-separated.
Three Types of Connection Node
Serial
Serial
Diverging
Converging (Collider)
Rule 1: x and y are d-connected if there is an unblocked path between them.
- a path is a consecutive sequence of undirected links
- a path is unblocked if it contains no collider (a node where arrows meet head-to-head)
- Q: which paths are d-connected?
Rule 1: Unconditional Separation
Rule 2: Blocking by Unconditioning
Rule 2: x and y are d-connected, conditioned on a set Z of nodes, if there is a collider-free path between x and y that traverses no member of Z.
- let Z = {r, v}. which paths are d-connected conditioned on Z?
Rule 3: Conditioning on Colliders
Rule 2: x and y are d-connected, conditioned on a set Z of nodes, if there is a collider-free path between x and y that traverses no member of Z. Rule 3: If a collider is a member of the conditioning set Z, or has a descendant in Z, then it no longer blocks any path that traces this collider.
- let Z = {r, p}. which paths are d-connected conditioned on Z?
TodayIntroduction to Bayesian Network
Probability Factorization based on BN
Various Forms of BN
Sampling based on BN
Conditional Independence (D-separation)