Bayesian Networks causal probabilistic network, or Bayesian network, s an directed acyclic graph (DAG) where nodes epresent variables and links represent dependency rela .g. of the type cause-effect, between variables nd quantified by (conditional) probabilities ualitative component + quantitative component A B C D E F G H
50
Embed
Bayesian Networks A causal probabilistic network, or Bayesian network, is an directed acyclic graph (DAG) where nodes represent variables and links represent.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Bayesian Networks• A causal probabilistic network, or Bayesian network, is an directed acyclic graph (DAG) where nodes represent variables and links represent dependency relations, e.g. of the type cause-effect, between variables and quantified by (conditional) probabilities
• Qualitative component + quantitative component
A
B
C
D
E
F
G
H
Bayesian Networks
• Qualitative component : relations of conditional dependence / independence
I(A, B | C): A and B are independent given CI(A, B) = I(A, B | Ø): A and B are a priori independent
• Formal study of the properties of the ternary relation I
• A Bayesian network may encode three fundamental types of relations among neighbour variables.
• to have in each node Xi the conditional probability distribution P(Xi | parents(Xi)) is enough to determine the full joint probability distribution P(X1,X2,...,Xn)
A: visit to Asia B: tuberculosisF: smoke E: lung cancerG: bronchitis C: B or ED: X-ray H: dyspnea
D-separation relations and probabilistic independence
Goal: precesely determine which independence relations (graphically) are defined by one DAG.
Previous definitions:
• A path is a sequence of connected nodes in the graph. • A non directed path is a path without taking into account the directions of the arrows. • A “head-to-head” link in a node is a (non directed) path of the form xyw, the node y is clalled a “head-to-head” node.
D-separation• A path c is called to be activated by a set of nodes Z if the following two conditions are satisfied:
1) Every node in c with links head-to-head is in Z or it has a descendent in Z.
2) Any other node in c does not belong to Z.Otherwise, the path c is called to be blocked by Z.
Definition. If X, Y and Z are three disjoint subsets of nodes disjunts in a DAG G, then Z d-separates X from Y, or equivalently X and Y are graphically independent given Z, when all the paths between any node from X and any node from Y are blocked by Z
D-separationA
B C
G
E
D
Theorem. Let G be a DAG and let X,Y and Z be subsets of nodes such that X and Y are d-separated by Z. Then, X and Y are conditionally independent from Zfor any probability P such that (G, P) is a causal network over G, that is, s.t. P(X | Y,Z) = P(X | Z) and P(Y | X,Z) = P(Y | Z).
{B} and {C} are d-separated by {A}:
Path B-E-C: E,G {A} - {A} blocks the path B-E-C
Path B-A-C: - {A} blocks the path B-A-C
Inference in Bayesian NetworksKnowledge about a domain encoded by a Bayesian network XB = (G, P).
Inference = updating probabilities: evidence E on values taken by some variables modify the probabilities of the rest of variables
P(X) --- > P’(X) = P(X | E)
Direct Method:
XB = < G = {A,B,C,D,E}, P(A,B,C,D,E) >
Evidence: A = ai, B = bjP ( a i , b j , c k , d m , e p )
m , p
∑
P ( a i , b j , c k , d m , e p )
k , m , p
∑
P(C = ck | A = ai, B = bj) =
Inference in Bayesian Networks• Bayesian networks allow local computations, which exploit the indepence relations among variables explictly induced by the corresponding DAG of the networks.
• They allow updating the probability of a variable using only the probabilities of the immediat predecessor nodes (parents), and in this way, step by step to update the probabilities of all non-instantiated variables in the network ---> propagation methods
• Two main propagation methods:
• Pearl method: message passing over the DAG
• Lauritzen & Spiegelhalter method: previous transformation of the DAG in a tree of cliques
Propagation method in trees of cliques
1) transformation of initial network in another graphical structure, a tree of cliques (subsets de nodes)
equivalent probabilistic information
BN = (G, P) ----> [Tree, P]
2) propagation algorithm over the new structure
Graphical TransformationDefinition: a “clique” in a non-directed graph is a complete
and maximal subgraph
To transform a DAG G in a tree of cliques:
1) Delete directions in edges of G: G’
2) Moralization of G’: add edges between nodes with common sons in the original DAG G: G’’
3) Triangularization of G’’ : G*
4) Identification of the cliques in G*
5) Suitable enumeration of the cliques (Running Inters. Prop.)
6) Construction of the tree according to the enumeration
Enumeration of cliques Clq1, Clq2, …, Clqn such that the following property holds:
Running Intersection Property: for all i=1,…, n there exists j < i such that Si Clqj , where Si = Clqi(Clq1Clq2...Clqi-1).
This property is guaranteed if: (i) nodes of the graph are enumerated following the criterion of “maximum cardinality search”(ii) cliques are ordered according to the node of the clique with a highest ranking in the former enumaration.
Causal network (G, P)([Clq1, ..., Clqp], ) is a potential representation for P
1) P(Clqi) = P(Ri|Si).P(Si)
2) P(Rp|Sp) = , where is the marginal
of the function with respect to the variables of Rp.
3) If father(Clqp) = Clqj, then ([Clq1,...Clqp-1], ') is a potential representation for the marginal distribution of P(V-Rp) where:
'(Clqi)=Clqi) for all i≠j, i < p'(Clqj)=Clqj)
( Clq p )
ψ ( Clq p )
R p
∑
( Clq p )
R p
∑
( Clq p )
R p
∑
Propagation algorithm: step by step (2)
Goal: to compute P(Clqi) for all cliques.
Two graph traversals: one bottom-up and one top-down
BU) start with clique Clqp . Combining properties 2 i 3 we have, an iterative form of computing the conditional distributions P(Ri|Si) in each clique until reaching the root clique Clq1.
Root: P(Clq1)=P(R1|S1).
TD) P(S2)= , and from there P(Si)=
--we can always compute in a clique Clqi the distribution P(Si) whenever we have already computed the distribution of its father clique Clqj --
Given a Bayesian network BN = (G, P), we have seen how
1) To transform G into a tree of cliques and factorize P as
P(X1, ..., Xn) = (Clq1) ·(Clq2) ·...·(Clqm)
where (Clqi)= ∏{P(Xj | parents(Xj)) | XjClqi, parents(Xj) Clqi,
2) To compute the probabilty distributions P(Clqi) with a propagation algorithm, and from there, to compute the probabilities P(Xj) for Xj Clqi, by marginalization.
Probability updating
It remains to see how to perform inference,
i.e. how to update probabilities P(Xj) when some information (evidence E) is available about some variables:
P(Xj) --- > P*(Xj) = P(Xj | E)
The updating mechanism is based in a fundamental property of the potential representations when applied to P(X1, ..., Xn) and its potential representation in terms of cliques:
P(X1, ..., Xn) = (Clq1) ·(Clq2) ·...·(Clqm)
Updating mechanismRecall:
• Let ([Clq1, ..., Clqm], ) be a potential representation for P(X1, …, Xn).
• We observe: X3 = a and X5 = b.
• Actualització de la probabilitat: P*(X1,X2,X4,X6,..., Xn) = P(X1, ...,Xn| X3=a,X5 = b)