Top Banner
ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial, Eric Xing’s course and some figures from Bishop’s book and others
48

ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

May 24, 2018

Download

Documents

lephuc
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

ECE521 W17 Tutorial 8

Eleni Triantafillou and Yuhuai (Tony) WuSome slides borrowed from last year’s tutorial, Eric Xing’s course and some figures from Bishop’s book

and others

Page 2: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

Conditional Independence

● We are often interested in computing joint probability distributions● It is desirable to decompose it into a product of factors, each depending on a

subset of the variables, for ease of computation.● Conditional independence properties between the variables allow us to do

this.● A common example of conditional independence: Markov chains.

We assume that the future is independent of the past given the present.

Page 3: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

● Bayesian networks (i.e. BN, BayesNet ), directed-acyclic-graph (DAG)

Graphical models

Page 4: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

● Bayesian networks (i.e. BN, BayesNet ), directed-acyclic-graph (DAG)

● Markov random fields, undirected graph

Graphical models

Page 5: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,
Page 6: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,
Page 7: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

Common parentAccording to the graphical model, we can decompose the joint probability over the 3 variables as:

In general, we have:

This does not in general decompose into:

So a and b are not independent.

Page 8: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

Common parent… But if we observe c:

So a and b are conditionally independent given c

Page 9: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

CascadeAccording to the graphical model we can decompose the joint as:

In general, we have:

Which does not in general factorize as:

So a and b are not independent

Page 10: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

Cascade

But if we condition on c...

a and b are conditionally independent given c

Page 11: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

V-structureAccording to the graphical model we can decompose the joint as:

In general, we have:

So a and b are independent!

Page 12: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

V-structure… but if we condition on c:

which does does not in general factorize into

Therefore a and b are not conditionally independent given c.

Page 13: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,
Page 14: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,
Page 15: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,
Page 16: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,
Page 17: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,
Page 18: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

Factor Graphs● Both directed and undirected graphical models express a joint probability

distribution in a factorized way. For example:● Directed:

● Undirected:

Page 19: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

Factor GraphsLet us write the joint distribution over a set of variables in the form of a product of factors (with denoting a subset of variables):

Factor graphs have nodes for variables as before (circles) and also for factors (squares). This can be used to represent either a directed or undirected PGM.

Page 20: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

Example factor graphs for directed GM

Page 21: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

Example factor graphs for undirected GM

Page 22: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

● The Markov blanket for our factor graphs is very similar to MRFs

● The Markov blanket of a variable node in a factor graph is given by the variables’ second neighbours

Conditional independence in factor graph

Page 23: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

Conditional independence in Bayesian nets examples

Page 24: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

Conditional independence in Bayesian nets examples

Page 25: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

● Converting Bayesian Networks to factor graph takes the following steps:

○ Consider all the parents of a child node

○ “Pinch” all the edges from its parents to the child into one factor

○ Create an additional edge from the factor to the child node

○ Move on the the next child node

○ Last step is to add all the priors as individual “dongles” to the corresponding variables

● Let the original BN have N variables and E edges. The converted factor graph will have N+E edges in total

BNs factor graph

Page 26: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

● Converting Bayesian Networks to factor graph takes the following steps:

○ Consider all the parents of a child node

○ “Pinch” all the edges from its parents to the child into one factor

○ Create an additional edge from the factor to the child node

○ Move on the the next child node

○ Last step is to add all the priors as individual “dongles” to the corresponding variables

● Let the original BN have N variables and E edges. The converted factor graph will have N+E edges in total

BNs factor graph

Page 27: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

● Converting Bayesian Networks to factor graph takes the following steps:

○ Consider all the parents of a child node

○ “Pinch” all the edges from its parents to the child into one factor

○ Create an additional edge from the factor to the child node

○ Move on the the next child node

○ Last step is to add all the priors as individual “dongles” to the corresponding variables

● Let the original BN have N variables and E edges. The converted factor graph will have N+E edges in total

BNs factor graph

Page 28: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

● Converting Bayesian Networks to factor graph takes the following steps:

○ Consider all the parents of a child node

○ “Pinch” all the edges from its parents to the child into one factor

○ Create an additional edge from the factor to the child node

○ Move on the the next child node

○ Last step is to add all the priors as individual “dongles” to the corresponding variables

● Let the original BN have N variables and E edges. The converted factor graph will have N+E edges in total

BNs factor graph

Page 29: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

● Converting Bayesian Networks to factor graph takes the following steps:

○ Consider all the parents of a child node

○ “Pinch” all the edges from its parents to the child into one factor

○ Create an additional edge from the factor to the child node

○ Move on the the next child node

○ Last step is to add all the priors as individual “dongles” to the corresponding variables

● Let the original BN have N variables and E edges. The converted factor graph will have N+E edges in total

BNs factor graph

Page 30: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

● Converting Bayesian Networks to factor graph takes the following steps:

○ Consider all the parents of a child node

○ “Pinch” all the edges from its parents to the child into one factor

○ Create an additional edge from the factor to the child node

○ Move on the the next child node

○ Last step is to add all the priors as individual “dongles” to the corresponding variables

● Let the original BN have N variables and E edges. The converted factor graph will have N+E edges in total

BNs factor graph

Page 31: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

● Converting Bayesian Networks to factor graph takes the following steps:

○ Consider all the parents of a child node

○ “Pinch” all the edges from its parents to the child into one factor

○ Create an additional edge from the factor to the child node

○ Move on the the next child node

○ Last step is to add all the priors as individual “dongles” to the corresponding variables

● Let the original BN have N variables and E edges. The converted factor graph will have N+E edges in total

BNs factor graph

Page 32: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

● Converting Bayesian Networks to factor graph takes the following steps:

○ Consider all the parents of a child node

○ “Pinch” all the edges from its parents to the child into one factor

○ Create an additional edge from the factor to the child node

○ Move on the the next child node

○ Last step is to add all the priors as individual “dongles” to the corresponding variables

● Let the original BN have N variables and E edges. The converted factor graph will have N+E edges in total

BNs factor graph

Page 33: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

● With this approach you may get factor graphs like the following:

which can be simplified to:

BNs factor graph

Page 34: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

● Convert FG back to BN by just reserving the “pinching” on each factor node

● Then put back the direction on the edge according to the conditional probabilities

BNs factor graph

Page 35: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

Notice that we don’t get the same factor graph back...

To Bayes net

To factor graph

BNs factor graph

Page 36: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

● Converting Markov Random Fields to factor graph takes the following steps:

○ Consider all the maximum cliques of the MRF

○ Create a factor node for each of the maximum cliques

○ Connect all the nodes of the maximum clique to the new factor nodes

MRF factor graph

Page 37: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

● Converting Markov Random Fields to factor graph takes the following steps:

○ Consider all the maximum cliques of the MRF

○ Create a factor node for each of the maximum cliques

○ Connect all the nodes of the maximum clique to the new factor nodes

MRF factor graph

Page 38: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

● Converting Markov Random Fields to factor graph takes the following steps:

○ Consider all the maximum cliques of the MRF

○ Create a factor node for each of the maximum cliques

○ Connect all the nodes of the maximum clique to the new factor nodes

MRF factor graph

Page 39: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

● Converting Markov Random Fields to factor graph takes the following steps:

○ Consider all the maximum cliques of the MRF

○ Create a factor node for each of the maximum cliques

○ Connect all the nodes of the maximum clique to the new factor nodes

MRF factor graph

Page 40: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

● Convert FG back to MRF is easy

● For each factor, create all pairwise connections of the variables in the factor

MRF factor graph

Page 41: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

To factor graph

Algorithm:● Create the factor graph for the Bayesian network● Then remove the factors but add edges between any two nodes that share

a factor

To MRF

BNs MRF

Page 42: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

To MRF

To Bayes net OR

We don’t get the same Bayesian net back from this conversion...

BNs MRF

Page 43: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

Posterior inference example

Page 44: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

Posterior inference example

Page 45: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

Posterior inference example

Page 46: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

Posterior inference example

Page 47: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

Posterior inference example

Page 48: ECE521 W17 Tutorial 8 - University of Torontojimmy/ece521/Tut8.pdf · ECE521 W17 Tutorial 8 Eleni Triantafillou and Yuhuai (Tony) Wu Some slides borrowed from last year’s tutorial,

Posterior inference example