Top Banner
Lecture 8 Approximate Inference Data Analysis and Probabilistic Inference Lecture 8 Slide 1
44

Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Jul 31, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Lecture 8

Approximate Inference

Data Analysis and Probabilistic Inference Lecture 8 Slide 1

Page 2: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Highly Dependent Data

Approach 1: Model all the dependencies:

DataDistribution

a1 b3 c2 d1a3 b2 c4 d2a5 b1 c1 d3. . . .. . . .. . . .

Data Analysis and Probabilistic Inference Lecture 8 Slide 2

Page 3: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Highly Dependent Data

Approach 1: Model all the dependencies:

DataDistribution

a1 b3 c2 d1a3 b2 c4 d2a5 b1 c1 d3. . . .. . . .. . . .

Data Analysis and Probabilistic Inference Lecture 8 Slide 3

Page 4: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Highly Dependent Data

Approach 1: Model all the dependencies:

DataDistribution

a1 b3 c2 d1a3 b2 c4 d2a5 b1 c1 d3. . . .. . . .. . . .

Propagating probabilities is difficult or infeasible!

Data Analysis and Probabilistic Inference Lecture 8 Slide 4

Page 5: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Highly Dependent Data

Approach 2: Find the maximally weighted spanning tree:

DataDistribution

a1 b3 c2 d1a3 b2 c4 d2a5 b1 c1 d3. . . .. . . .. . . .

Data Analysis and Probabilistic Inference Lecture 8 Slide 5

Page 6: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Highly Dependent Data

Approach 2: Find the maximally weighted spanning tree:

DataDistribution

a1 b3 c2 d1a3 b2 c4 d2a5 b1 c1 d3. . . .. . . .. . . .

Data Analysis and Probabilistic Inference Lecture 8 Slide 6

Page 7: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Highly Dependent Data

Approach 2: Find the maximally weighted spanning tree:

DataDistribution

a1 b3 c2 d1a3 b2 c4 d2a5 b1 c1 d3. . . .. . . .. . . .

Loops are not allowed

Data Analysis and Probabilistic Inference Lecture 8 Slide 7

Page 8: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Highly Dependent Data

Approach 2: Find the maximally weighted spanning tree:

DataDistribution

a1 b3 c2 d1a3 b2 c4 d2a5 b1 c1 d3. . . .. . . .. . . .

Data Analysis and Probabilistic Inference Lecture 8 Slide 8

Page 9: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Highly Dependent Data

Approach 2: Find the maximally weighted spanning tree:

DataDistribution

a1 b3 c2 d1a3 b2 c4 d2a5 b1 c1 d3. . . .. . . .. . . .

Now the network does not model the dependenciesaccurately

Data Analysis and Probabilistic Inference Lecture 8 Slide 9

Page 10: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Exact and Approximate Inference

• If we include all dependencies then computation isexact, but can be computationally infeasible for largesized networks and large data sets. We will look atexact computation algorithms later in the course.

• If we choose a spanning tree then message passingterminates in one pass and is very fast, but theinference is only approximate.

Data Analysis and Probabilistic Inference Lecture 8 Slide 10

Page 11: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Problems with Loops in Networks

Issue 1: Looping Messages.

Data Analysis and Probabilistic Inference Lecture 8 Slide 11

Page 12: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Problems with Loops in Networks

Issue 1: Looping Messages.

When only Node F is instantiatedthere is no condition that stops themessages travelling round the loop BC D E.

Data Analysis and Probabilistic Inference Lecture 8 Slide 12

Page 13: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Problems with Loops in Networks

Issue 1: Looping Messages.

When only Node F is instantiatedthere is no condition that stops themessages travelling round the loop BC D E.

Exact propagation can still be carriedout if one of nodes B C or D isinstantiated

Data Analysis and Probabilistic Inference Lecture 8 Slide 13

Page 14: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Problems with Loops in Networks

Issue 2: Independence of MultipleParents

Data Analysis and Probabilistic Inference Lecture 8 Slide 14

Page 15: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Problems with Loops in Networks

Issue 2: Independence of MultipleParents

When only Node A is instantiatedpropagation terminates. HoweverC and D are not independent andso the π evidence at E and F is notcorrect.

Data Analysis and Probabilistic Inference Lecture 8 Slide 15

Page 16: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Problems with Loops in Networks

Issue 2: Independence of MultipleParents

When only Node A is instantiatedpropagation terminates. However Cand D are not independent and so theπ evidence at E and F is not correct.

Exact propagation can still be carriedout if one of nodes B C or D isinstantiated. This will make C and Dindependent.

Data Analysis and Probabilistic Inference Lecture 8 Slide 16

Page 17: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Approximate Inference Methods

Spanning trees and Naive Bayesian networks can beconsidered approximate inference methods They modelthe most important dependencies, though not all.

Their performance can be improved by a number oftechniques including:

1. Node Deletion

2. Allowing Loopy belief Propagation

3. Hidden (or Latent) Node Placement

Data Analysis and Probabilistic Inference Lecture 8 Slide 17

Page 18: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Node Deletion

• Given a pair of highly dependent nodes it has beenfound that deleting one sometimes improves anetworks predictive performance.

• This is a surprising result from which it is difficult toinfer any general rule.

• Node deletion is something that can be testedexperimentally.

Data Analysis and Probabilistic Inference Lecture 8 Slide 18

Page 19: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Selective Bayesian Network

The idea here is to use only a subset of the variables.

This can be done by starting with all the variables thendeleting any suspect variable and testing forimprovement in performance. Deletion continues untilno further improvement can be found.

We can find suspect variables by testing pairs of childrenfor high conditional dependence.

Data Analysis and Probabilistic Inference Lecture 8 Slide 19

Page 20: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Selective Bayesian Networks

Equivalently we can add variables incrementally (independency order) and test the performance of eachnetwork.

Data Analysis and Probabilistic Inference Lecture 8 Slide 20

Page 21: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Why can removing variables improve theperformance

• Consider a Bayesian networkwhere the same variableappears twice.

• Clearly conditionalindependence doesn’t hold

• The network will be biased infavour of C

• Deleting C will improveperformance

The improvement will depend on the quantity ofunaccounted dependency between variables.

Data Analysis and Probabilistic Inference Lecture 8 Slide 21

Page 22: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Loopy Belief Propagation

Another approximate method is to include all arcsexpressing significant dependency, and allowpropagation to continue until:

• The probability distributions reach a stable state, or

• A limiting number of iterations has occurred (theremay be no termination)

Loopy belief propagation has been shown to beequivalent to a multivariate optimisation problem. It willmost likely find a local optimum. We cannot say anythingabout its accuracy.

Data Analysis and Probabilistic Inference Lecture 8 Slide 22

Page 23: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Hidden Nodes (or Latent Variables)

If any two children of a parent node are not conditionallyindependent, they can be separated by a hidden node:

The new noderepresents a commoncause that relates Band C. It is calledhidden because wehave no correspondingmeasured variable.

Now we look at how to obtain it statistically.

Data Analysis and Probabilistic Inference Lecture 8 Slide 23

Page 24: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Switch Nodes

Adding hidden nodes which act as switches can simplifycomplex networks.

Example from Neopolitan:

Data Analysis and Probabilistic Inference Lecture 8 Slide 24

Page 25: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Advantages of Adding Hidden Nodes

A network can always perform as well with a hiddennode as it can without:

Data Analysis and Probabilistic Inference Lecture 8 Slide 25

Page 26: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Advantages of Adding Hidden Nodes

A network can always perform as well with a hiddennode as it can without:

Data Analysis and Probabilistic Inference Lecture 8 Slide 26

Page 27: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Advantages of Adding Hidden Nodes

A network can always perform as well with a hiddennode as it can without:

Data Analysis and Probabilistic Inference Lecture 8 Slide 27

Page 28: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Using Hidden Nodes

In order to create a hidden node we need to:

1. decide how many states the hidden node is to have;

2. identify values for the three new link matricesintroduced.

It may be possible to obtain hidden node informationfrom an expert (eg the eyes example from lecture 2). Forexample an expert may:

1. identify a variable corresponding to the hidden node;

2. provide data for training (ie calculating the linkmatrices).

In general however this is not often possible.

Data Analysis and Probabilistic Inference Lecture 8 Slide 28

Page 29: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

How Many States?

• We expect that the number of states of a hidden nodewill be comparable to the number of states of thenodes its is separating.

• From the previous slides we would expect the hiddennode to have at least the same number of states asits parent.

• Link matrices with too many states will have very lowprobabilities for some states, so a possible approachis to sart with a large number of states and reducethe number depending on how many low probabilitystates we have.

Data Analysis and Probabilistic Inference Lecture 8 Slide 29

Page 30: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Calculating the Conditional Probabilities

1. Given estimates of:P(H |A),P(B|H),P(C|H) and aset of data points [ai ,bj,ck ]

2. Use each bj,ck to computeP ′(A) from the network,calculate and accumulate anerror:

E = (P ′(A)−P(ai))2

3. Minimise E over the data setby adjusting the elements ofP(H |A),P(B|H),P(C|H)

Data Analysis and Probabilistic Inference Lecture 8 Slide 30

Page 31: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Calculating the Conditional Probabilities

For each conditional probabilityP(cj|hk) we need to find a valuefor:

∂E∂P(cj |hk)

Then in each epoch we updatethe conditional probabilitiesusing:

P(cj|hk)⇒ P(cj|hk)−µ∂E

∂P(cj |hk)

Gradients may be calculated analytically or numerically.A closed form equation for the gradients was developedby Chee Keong Kwoh.

Data Analysis and Probabilistic Inference Lecture 8 Slide 31

Page 32: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Gradient Descent and Probabilities

Gradient descent has problems when applied toprobability distributions. After one cycle of updating:• Distributions will no longer sum to 1• Individual probability values may be greater than 1

or less than 0

The conditional probability matrices must be normalisedso that the columns sum to 1. This may compromisefinding an optimal solution.

Data Analysis and Probabilistic Inference Lecture 8 Slide 32

Page 33: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Propagation Strategies for calculating errors

Strategies may be alternated during the optimisation andthis produces annealing behaviour:

Data Analysis and Probabilistic Inference Lecture 8 Slide 33

Page 34: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Hidden Nodes for Removing Loops

Suppose we build a network including all thedependencies, we can then use hidden nodes to removeany loops that were formed. In the case of the simpletriple we have seen that:

The process is to remove the least dependent link of amultiple parent.

Data Analysis and Probabilistic Inference Lecture 8 Slide 34

Page 35: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Reducing bigger loops

We can apply the same process to bigger loops. Modellingthe dependency between C and D we get:

The training methods still work since for anyinstantiation of A or B the probability propagation willfinish.

Data Analysis and Probabilistic Inference Lecture 8 Slide 35

Page 36: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Reducing bigger loops

We can continue by modelling the dependency between Band H1:

This results in a singly connected network but with twohidden nodes.

Data Analysis and Probabilistic Inference Lecture 8 Slide 36

Page 37: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Reducing bigger loops

We can always reduce any network a singly connectedform by this method. One possible form of the Asianetwork is:

However, the largenumber of hidden nodesmakes the method lookless attractive.

The performance willbecome increasinglydependent on thetraining data andtraining process.

Data Analysis and Probabilistic Inference Lecture 8 Slide 37

Page 38: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Reducing bigger loops

Could we simplify things by combining our two hiddennodes into one?:

The answer to this is very much data dependent. Clearlythe hidden node now has to model the dependencybetween B and C that comes through both the commonparent A and the common child D.

Data Analysis and Probabilistic Inference Lecture 8 Slide 38

Page 39: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Limitations of the Hidden Node Method

There is clearly going to be a limit to the degree to whichwe can model dependencies through hidden nodes. Asthe dependencies become more complex, either:

1. We will need many hidden nodes, or

2. The number of states in the hidden node will becomevery large

In either case we may not have enough data to train thenew network.

Data Analysis and Probabilistic Inference Lecture 8 Slide 39

Page 40: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Criteria for Introducing Hidden Nodes

Given a network we can measure the conditionaldependency of each pair of children given the parents.

If this is high we expect that benefits will occur fromintroducing a hidden node.

However below a certain threshold we are unlikely tobenefit from a hidden node and may choose to ignore thedependency.

Data Analysis and Probabilistic Inference Lecture 8 Slide 40

Page 41: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Other Hidden Node Methodologies

Other more heuristic methods have been suggested foremploying hidden nodes.

Starting with a naive network:

Data Analysis and Probabilistic Inference Lecture 8 Slide 41

Page 42: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Other Hidden Node Methodologies

Other more heuristic methods have been suggested foremploying hidden nodes.

Find all significant conditional dependencies:

Data Analysis and Probabilistic Inference Lecture 8 Slide 42

Page 43: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Other Hidden Node Methodologies

Other more heuristic methods have been suggested foremploying hidden nodes.

Model them with hidden nodes:

Data Analysis and Probabilistic Inference Lecture 8 Slide 43

Page 44: Data Analysis and Probabilistic Inferencedfg/ProbabilisticInference/ID... · 2018-07-20 · Data Analysis and Probabilistic Inference Lecture 8 Slide 32. Propagation Strategies for

Other Hidden Node Methodologies

A similar idea can be applied starting with a spanningtree;

Data Analysis and Probabilistic Inference Lecture 8 Slide 44