Top Banner
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
71

Pattern Recognition and Machine Learning : Graphical Models

Dec 02, 2014

Download

Documents

butest

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Pattern Recognition and Machine Learning : Graphical Models

PATTERN RECOGNITION AND MACHINE LEARNINGCHAPTER 8: GRAPHICAL MODELS

Page 2: Pattern Recognition and Machine Learning : Graphical Models

Bayesian Networks

Directed Acyclic Graph (DAG)

Page 3: Pattern Recognition and Machine Learning : Graphical Models

Bayesian Networks

General Factorization

Page 4: Pattern Recognition and Machine Learning : Graphical Models

Bayesian Curve Fitting (1)

Polynomial

Page 5: Pattern Recognition and Machine Learning : Graphical Models

Bayesian Curve Fitting (2)

Plate

Page 6: Pattern Recognition and Machine Learning : Graphical Models

Bayesian Curve Fitting (3)

Input variables and explicit hyperparameters

Page 7: Pattern Recognition and Machine Learning : Graphical Models

Bayesian Curve Fitting—Learning

Condition on data

Page 8: Pattern Recognition and Machine Learning : Graphical Models

Bayesian Curve Fitting—Prediction

Predictive distribution:

where

Page 9: Pattern Recognition and Machine Learning : Graphical Models

Generative Models

Causal process for generating images

Page 10: Pattern Recognition and Machine Learning : Graphical Models

Discrete Variables (1)

General joint distribution: K 2 { 1 parameters

Independent joint distribution: 2(K { 1) parameters

Page 11: Pattern Recognition and Machine Learning : Graphical Models

Discrete Variables (2)

General joint distribution over M variables: KM { 1 parameters

M -node Markov chain: K { 1 + (M { 1) K(K { 1) parameters

Page 12: Pattern Recognition and Machine Learning : Graphical Models

Discrete Variables: Bayesian Parameters (1)

Page 13: Pattern Recognition and Machine Learning : Graphical Models

Discrete Variables: Bayesian Parameters (2)

Shared prior

Page 14: Pattern Recognition and Machine Learning : Graphical Models

Parameterized Conditional Distributions

If are discrete, K-state variables, in general has O(K M) parameters.

The parameterized form

requires only M + 1 parameters

Page 15: Pattern Recognition and Machine Learning : Graphical Models

Linear-Gaussian Models

Directed Graph

Vector-valued Gaussian Nodes

Each node is Gaussian, the mean is a linear function of the parents.

Page 16: Pattern Recognition and Machine Learning : Graphical Models

Conditional Independence

a is independent of b given c

Equivalently

Notation

Page 17: Pattern Recognition and Machine Learning : Graphical Models

Conditional Independence: Example 1

Page 18: Pattern Recognition and Machine Learning : Graphical Models

Conditional Independence: Example 1

Page 19: Pattern Recognition and Machine Learning : Graphical Models

Conditional Independence: Example 2

Page 20: Pattern Recognition and Machine Learning : Graphical Models

Conditional Independence: Example 2

Page 21: Pattern Recognition and Machine Learning : Graphical Models

Conditional Independence: Example 3

Note: this is the opposite of Example 1, with c unobserved.

Page 22: Pattern Recognition and Machine Learning : Graphical Models

Conditional Independence: Example 3

Note: this is the opposite of Example 1, with c observed.

Page 23: Pattern Recognition and Machine Learning : Graphical Models

“Am I out of fuel?”

B = Battery (0=flat, 1=fully charged)F = Fuel Tank (0=empty, 1=full)G = Fuel Gauge Reading

(0=empty, 1=full)

and hence

Page 24: Pattern Recognition and Machine Learning : Graphical Models

“Am I out of fuel?”

Probability of an empty tank increased by observing G = 0.

Page 25: Pattern Recognition and Machine Learning : Graphical Models

“Am I out of fuel?”

Probability of an empty tank reduced by observing B = 0. This referred to as “explaining away”.

Page 26: Pattern Recognition and Machine Learning : Graphical Models

D-separation• A, B, and C are non-intersecting subsets of nodes in a

directed graph.• A path from A to B is blocked if it contains a node such that

eithera) the arrows on the path meet either head-to-tail or tail-

to-tail at the node, and the node is in the set C, orb) the arrows meet head-to-head at the node, and

neither the node, nor any of its descendants, are in the set C.

• If all paths from A to B are blocked, A is said to be d-separated from B by C. • If A is d-separated from B by C, the joint distribution over

all variables in the graph satisfies .

Page 27: Pattern Recognition and Machine Learning : Graphical Models

D-separation: Example

Page 28: Pattern Recognition and Machine Learning : Graphical Models

D-separation: I.I.D. Data

Page 29: Pattern Recognition and Machine Learning : Graphical Models

Directed Graphs as Distribution Filters

Page 30: Pattern Recognition and Machine Learning : Graphical Models

The Markov Blanket

Factors independent of xi cancel between numerator and denominator.

Page 31: Pattern Recognition and Machine Learning : Graphical Models

Cliques and Maximal Cliques

Clique

Maximal Clique

Page 32: Pattern Recognition and Machine Learning : Graphical Models

Joint Distribution

where is the potential over clique C and

is the normalization coefficient; note: M K-state variables KM terms in Z.

Energies and the Boltzmann distribution

Page 33: Pattern Recognition and Machine Learning : Graphical Models

Illustration: Image De-Noising (1)

Original Image Noisy Image

Page 34: Pattern Recognition and Machine Learning : Graphical Models

Illustration: Image De-Noising (2)

Page 35: Pattern Recognition and Machine Learning : Graphical Models

Illustration: Image De-Noising (3)

Noisy Image Restored Image (ICM)

Page 36: Pattern Recognition and Machine Learning : Graphical Models

Illustration: Image De-Noising (4)

Restored Image (Graph cuts)Restored Image (ICM)

Page 37: Pattern Recognition and Machine Learning : Graphical Models

Converting Directed to Undirected Graphs (1)

Page 38: Pattern Recognition and Machine Learning : Graphical Models

Converting Directed to Undirected Graphs (2)

Additional links

Page 39: Pattern Recognition and Machine Learning : Graphical Models

Directed vs. Undirected Graphs (1)

Page 40: Pattern Recognition and Machine Learning : Graphical Models

Directed vs. Undirected Graphs (2)

Page 41: Pattern Recognition and Machine Learning : Graphical Models

Inference in Graphical Models

Page 42: Pattern Recognition and Machine Learning : Graphical Models

Inference on a Chain

Page 43: Pattern Recognition and Machine Learning : Graphical Models

Inference on a Chain

Page 44: Pattern Recognition and Machine Learning : Graphical Models

Inference on a Chain

Page 45: Pattern Recognition and Machine Learning : Graphical Models

Inference on a Chain

Page 46: Pattern Recognition and Machine Learning : Graphical Models

Inference on a Chain

To compute local marginals:•Compute and store all forward messages, .•Compute and store all backward messages, . •Compute Z at any node xm •Compute

for all variables required.

Page 47: Pattern Recognition and Machine Learning : Graphical Models

Trees

Undirected Tree Directed Tree Polytree

Page 48: Pattern Recognition and Machine Learning : Graphical Models

Factor Graphs

Page 49: Pattern Recognition and Machine Learning : Graphical Models

Factor Graphs from Directed Graphs

Page 50: Pattern Recognition and Machine Learning : Graphical Models

Factor Graphs from Undirected Graphs

Page 51: Pattern Recognition and Machine Learning : Graphical Models

The Sum-Product Algorithm (1)

Objective:i. to obtain an efficient, exact inference

algorithm for finding marginals;ii. in situations where several marginals are

required, to allow computations to be shared efficiently.

Key idea: Distributive Law

Page 52: Pattern Recognition and Machine Learning : Graphical Models

The Sum-Product Algorithm (2)

Page 53: Pattern Recognition and Machine Learning : Graphical Models

The Sum-Product Algorithm (3)

Page 54: Pattern Recognition and Machine Learning : Graphical Models

The Sum-Product Algorithm (4)

Page 55: Pattern Recognition and Machine Learning : Graphical Models

The Sum-Product Algorithm (5)

Page 56: Pattern Recognition and Machine Learning : Graphical Models

The Sum-Product Algorithm (6)

Page 57: Pattern Recognition and Machine Learning : Graphical Models

The Sum-Product Algorithm (7)

Initialization

Page 58: Pattern Recognition and Machine Learning : Graphical Models

The Sum-Product Algorithm (8)

To compute local marginals:• Pick an arbitrary node as root• Compute and propagate messages from the leaf

nodes to the root, storing received messages at every node.

• Compute and propagate messages from the root to the leaf nodes, storing received messages at every node.

• Compute the product of received messages at each node for which the marginal is required, and normalize if necessary.

Page 59: Pattern Recognition and Machine Learning : Graphical Models

Sum-Product: Example (1)

Page 60: Pattern Recognition and Machine Learning : Graphical Models

Sum-Product: Example (2)

Page 61: Pattern Recognition and Machine Learning : Graphical Models

Sum-Product: Example (3)

Page 62: Pattern Recognition and Machine Learning : Graphical Models

Sum-Product: Example (4)

Page 63: Pattern Recognition and Machine Learning : Graphical Models

The Max-Sum Algorithm (1)

Objective: an efficient algorithm for finding i. the value xmax that maximises p(x);ii. the value of p(xmax).

In general, maximum marginals joint maximum.

Page 64: Pattern Recognition and Machine Learning : Graphical Models

The Max-Sum Algorithm (2)

Maximizing over a chain (max-product)

Page 65: Pattern Recognition and Machine Learning : Graphical Models

The Max-Sum Algorithm (3)

Generalizes to tree-structured factor graph

maximizing as close to the leaf nodes as possible

Page 66: Pattern Recognition and Machine Learning : Graphical Models

The Max-Sum Algorithm (4)

Max-Product Max-SumFor numerical reasons, use

Again, use distributive law

Page 67: Pattern Recognition and Machine Learning : Graphical Models

The Max-Sum Algorithm (5)

Initialization (leaf nodes)

Recursion

Page 68: Pattern Recognition and Machine Learning : Graphical Models

The Max-Sum Algorithm (6)

Termination (root node)

Back-track, for all nodes i with l factor nodes to the root (l=0)

Page 69: Pattern Recognition and Machine Learning : Graphical Models

The Max-Sum Algorithm (7)

Example: Markov chain

Page 70: Pattern Recognition and Machine Learning : Graphical Models

The Junction Tree Algorithm

• Exact inference on general graphs.• Works by turning the initial graph into a

junction tree and then running a sum-product-like algorithm.

• Intractable on graphs with large cliques.

Page 71: Pattern Recognition and Machine Learning : Graphical Models

Loopy Belief Propagation

• Sum-Product on general graphs.• Initial unit messages passed across all links,

after which messages are passed around until convergence (not guaranteed!).

• Approximate but tractable for large graphs.• Sometime works well, sometimes not at all.