Top Banner
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shiraz
62

Variational Methods for Graphical Models

Feb 11, 2016

Download

Documents

jorn

Variational Methods for Graphical Models. Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul. Presented by: Afsaneh Shirazi. Outline. Motivation Inference in graphical models Exact inference is intractable Variational methodology Sequential approach Block approach - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Variational Methods for Graphical Models

Variational Methods for Graphical Models

Micheal I. JordanZoubin GhahramaniTommi S. JaakkolaLawrence K. Saul

Presented by: Afsaneh Shirazi

Page 2: Variational Methods for Graphical Models

2

Outline

• Motivation• Inference in graphical models• Exact inference is intractable• Variational methodology

– Sequential approach– Block approach

• Conclusions

Page 3: Variational Methods for Graphical Models

3

Motivation(Example: Medical Diagnosis)

symptoms

diseases

What is the most probable disease?

Page 4: Variational Methods for Graphical Models

4

Motivation

• We want to answer some queries about our data

• Graphical model is a way to model data• Inference in some graphical models is

intractable (NP-hard)• Variational methods simplify the inference

in graphical models by using approximation

Page 5: Variational Methods for Graphical Models

5

Graphical Models

• Directed (Bayesian network)

• Undirected

S1

S3

S5

S4

S2P(S2)

P(S1)

P(S5|S3,S4)

P(S3|S1,S2) P(S4|S3)

(C1)

(C2)

(C3)

Page 6: Variational Methods for Graphical Models

6

Inference in Graphical Models

Inference: Given a graphical model, the process of computing answers to queries

• How computationally hard is this decision problem?

• Theorem: Computing P(X = x) in a Bayesian network is NP-hard

Page 7: Variational Methods for Graphical Models

7

Why Exact Inference is Intractable?

symptoms

diseases

Diagnose the most probable disease

Page 8: Variational Methods for Graphical Models

8

Why Exact Inference is Intractable?

symptoms

diseases

: Observed symptoms

)()|(),( dPdfPdfP f

Page 9: Variational Methods for Graphical Models

9

Why Exact Inference is Intractable?

symptoms

diseases:Noisy-OR model)|( dfP i

101

Page 10: Variational Methods for Graphical Models

10

Why Exact Inference is Intractable?

symptoms

diseases :Noisy-OR model)|( dfP i

101

))1,0,1(|0( ifP

Page 11: Variational Methods for Graphical Models

11

Why Exact Inference is Intractable?

)( 0

)( 0

1)|1(

)1()1()|0()(

0

ij ijij

ij ijij

j

d

i

d

ij

dijii

edfP

e

qqdfP

Page 12: Variational Methods for Graphical Models

12

Why Exact Inference is Intractable?

symptoms

diseases

: Observed symptoms

jj

ii dPdfP

dPdfPdfP

)()|(

)()|(),(f

j jjkij kijjkiij ijji ddd

eee*0

*)( 0)0( 000 ...

Page 13: Variational Methods for Graphical Models

13

Why Exact Inference is Intractable?

symptoms

diseases

: Observed symptoms

jj

ii dPdfP

dPdfPdfP

)()|(

)()|(),(f

)1(...)1( )( 0)0( 000

kij kijjkiij ijji ddee

Page 14: Variational Methods for Graphical Models

14

Reducing the Computational Complexity

Variational Methods

Simple graph for exact methods

Approximate the probability

distribution

Use the role of convexity

Page 15: Variational Methods for Graphical Models

15

Express a Function Variationally

• is a concave function)ln(x

))((min )ln(

Hxx

))ln((min )( xxHx

Page 16: Variational Methods for Graphical Models

16

Express a Function Variationally

• is a concave function)ln(x

)1)ln((min )ln(

xx

Page 17: Variational Methods for Graphical Models

17

Express a Function Variationally

• If the function is not convex or concave: transform the function to a desired form

• Example: logistic function

xexf

11 )( ))((min

)(

Hx

exf

))(ln()( xfxg ))((min)(

Hxxg

Transformation

Approximation

Transforming back

Page 18: Variational Methods for Graphical Models

18

Approaches to Variational Methods

• Sequential Approach: (on-line) nodes are transformed in an order, determined during inference process

• Block Approach: (off-line) has obvious substructures

Page 19: Variational Methods for Graphical Models

19

Sequential Approach(Two Methods)

Untransformed Graph

Transform one node at a time

Simple Graph for exact methods

Reintroduce one node at a time

Simple Graph for exact methods

Completelytransformed

Graph

Page 20: Variational Methods for Graphical Models

20

Sequential Approach (Example)

)( 01)|1( ij ijijd

i edfP

symptoms

diseases

Log Concave

Page 21: Variational Methods for Graphical Models

21

Sequential Approach (Example)

)( 01)|1( ij ijijd

i edfP

symptoms

diseases

Log Concave

)(1 fxx ee

)(

)( ][)|1( 0

ij

dfi

jijiiii eedfP

Page 22: Variational Methods for Graphical Models

22

Sequential Approach (Example)

symptoms

diseases

)(

)( ][)|1( 0

ij

dfi

jijiiii eedfP

1

434)1( 3edP

)0( 3 dP

Page 23: Variational Methods for Graphical Models

23

Sequential Approach (Example)

symptoms

diseases

)(

)( ][)|1( 0

ij

dfi

jijiiii eedfP

1

Page 24: Variational Methods for Graphical Models

24

Sequential Approach (Example)

symptoms

diseases

)(

)( ][)|1( 0

ij

dfi

jijiiii eedfP

1

Page 25: Variational Methods for Graphical Models

25

Sequential Approach (Upper Bound and Lower Bound)

• We need both lower bound and upper bound

),(),(),(

)|(jj

jj dfPdfP

dfPfdP

)),(()),(()),((

)|(jj

jj dfPLBdfPUB

dfPUBfdP

Page 26: Variational Methods for Graphical Models

26

How to Compute Lower Bound for a Concave Function?

• Lower bound for concave functions:

j j

jj

j j

jj

jj

qz

afq

qz

qafzaf

)(

)()(

Variational parameter is probability distribution

jq

Page 27: Variational Methods for Graphical Models

27

Block Approach (Overview)

• Off-line application of sequential approach– Identify some structure amenable to exact

inference– Family of probability distribution via

introduction of parameters– Choose best approximation based on

evidence

Page 28: Variational Methods for Graphical Models

28

Block Approach (Details)

• KL divergence

}{ )()(ln)()||(

S SPSQSQPQD

)|( EHP

),|( EHQFamily of

),|( *EHQ

Minimize KL divergence

Page 29: Variational Methods for Graphical Models

29

Block Approach (Example – Boltzmann machine)

ZeSP

ji i iijiij SSS

0

)|(

Si

Sj

ij

Page 30: Variational Methods for Graphical Models

30

Block Approach (Example – Boltzmann machine)

ZeSP

ji i iijiij SSS

0

)|(

Si

Sj=1

1ij

Ej

jijici S 00

Page 31: Variational Methods for Graphical Models

31

Block Approach (Example – Boltzmann machine)

si

sj

c

SSS

ZeEHP

ji i icijiij

0

),|(

ii Si

Hi

SiEHQ

1)1(),|(

i

j

Page 32: Variational Methods for Graphical Models

32

Block Approach (Example – Boltzmann machine)

si

sji

j

Minimize KL Divergence

j

ijiji )( 0

xex

11 )(

Page 33: Variational Methods for Graphical Models

33

Block Approach (Example – Boltzmann machine)

si

sji

j

Minimize KL Divergence

j

ijiji )( 0

Mean field equations: solve for fixed point

Page 34: Variational Methods for Graphical Models

34

Conclusions

• Time or space complexity of exact calculation is unacceptable

• Complex graphs can be probabilistically simple

• Inference in simplified models provides bounds on probabilities in the original model

Page 35: Variational Methods for Graphical Models

35

Page 36: Variational Methods for Graphical Models

36

Extra Slides

Page 37: Variational Methods for Graphical Models

37

Concerns

• Approximation accuracy• Strong dependencies can be identified• Not based on convexity transformation• Not able to assure that the framework will

transfer to other examples• Not straightforward to develop a

variational approximation for new architectures

Page 38: Variational Methods for Graphical Models

38

Justification for KL Divergence

• Best lower bound on the probability of the evidence

}{

}{

}{

)|(),(ln)|(

)|(),()|(ln

),(ln)(ln

H

H

H

EHQEHPEHQ

EHQEHPEHQ

EHPEP

)(EP

Page 39: Variational Methods for Graphical Models

39

EM

• Maximum likelihood parameter estimation:

• Following function is the lower bound on log likelihood

)|( EP

)|(ln)|()|,(ln)|(),(}{

EHQEHQEHPEHQQLH

),()|(ln QLEP KL Divergence between Q(H|E) and P(H|E,)

Page 40: Variational Methods for Graphical Models

40

EM

1. Maximize the bound with respect to Q

2. Fix Q, maximize with respect to

),(maxarg :step) (E )()1( kQ

k QLQ

),(maxarg :step) (M )1()1( kk QL

),|( )(kEHP

Traditional EMApproximation to EM algorithm

Page 41: Variational Methods for Graphical Models

41

Principle of InferenceDAG

Junction Tree

Inconsistent Junction TreeInitialization

Consistent Junction TreePropagation

)|( eEvVPMarginalization

Page 42: Variational Methods for Graphical Models

42

Example: Create Join Tree

X1 X2

Y1 Y2

HMM with 2 time steps:

Junction Tree:

X1,X2X1,Y1 X2,Y2X1 X2

Page 43: Variational Methods for Graphical Models

43

Example: Initialization

Variable Associated Cluster

Potential function

X1 X1,Y1

Y1 X1,Y1

X2 X1,X2

Y2 X2,Y2

X1,Y1 P(X1)

X1,Y1 P(X1)P(Y1 | X1)

X1,X 2 P(X2 | X1)

X 2,Y 2 P(Y2 | X2)

X1,X2X1,Y1 X2,Y2X1 X2

Page 44: Variational Methods for Graphical Models

44

Example: Collect Evidence

• Choose arbitrary clique, e.g. X1,X2, where all potential functions will be collected.

• Call recursively neighboring cliques for messages:

• 1. Call X1,Y1.– 1. Projection:

– 2. Absorption:

X1 X1,Y1 P(X1,Y1)P(X1)Y1

{X1,Y1} X1

X1,X 2 X1,X 2X1

X1old P(X2 | X1)P(X1)P(X1,X2)

Page 45: Variational Methods for Graphical Models

45

Example: Collect Evidence (cont.)

• 2. Call X2,Y2:– 1. Projection:

– 2. Absorption:

X 2 X 2,Y 2 P(Y2 | X2)1Y 2

{X 2,Y 2} X 2

X1,X2X1,Y1 X2,Y2X1 X2

X1,X 2 X1,X 2X 2

X 2old P(X1,X2)

Page 46: Variational Methods for Graphical Models

46

Example: Distribute Evidence

• Pass messages recursively to neighboring nodes

• Pass message from X1,X2 to X1,Y1:– 1. Projection:

– 2. Absorption:

X1 X1,X 2 P(X1,X2)P(X1)X 2

{X1,X 2} X1

X1,Y1 X1,Y1X1

X1old P(X1,Y1) P(X1)

P(X1)

Page 47: Variational Methods for Graphical Models

47

Example: Distribute Evidence (cont.)

• Pass message from X1,X2 to X2,Y2:– 1. Projection:

– 2. Absorption:

X 2 X1,X 2 P(X1,X2)P(X2)X1

{X1,X 2} X 2

X 2,Y 2 X 2,Y 2X 2

X 2old P(Y2 | X2) P(X2)

1P(Y2,X2)

X1,X2X1,Y1 X2,Y2X1 X2

Page 48: Variational Methods for Graphical Models

48

Example: Inference with evidence

• Assume we want to compute: P(X2|Y1=0,Y2=1) (state estimation)

• Assign likelihoods to the potential functions during initialization:

X1,Y1 0 if Y11

P(X1,Y10) if Y10

X 2,Y 2 0 if Y20

P(Y21 | X2) if Y21

Page 49: Variational Methods for Graphical Models

49

Example: Inference with evidence (cont.)

• Repeating the same steps as in the previous case, we obtain:

X1,Y1 0 if Y11

P(X1,Y10,Y21) if Y10

X1 P(X1,Y10,Y21)X1,X 2 P(X1,Y10,X2,Y21)X 2 P(Y10,X2,Y21)

X 2,Y 2 0 if Y20

P(Y10,X2,Y21) if Y21

Page 50: Variational Methods for Graphical Models

50

Variable EliminationGeneral idea:• Write query in the form

• Iteratively– Move all irrelevant terms outside of innermost sum– Perform innermost sum, getting a new term– Insert the new term into the product

}\{

)|(),(nxX i

iin paxPXP e

Page 51: Variational Methods for Graphical Models

51

x

kxkx yyxfyyf ),,,('),,( 11

m

ilikx i

yyxfyyxf1

,1,1,11 ),,(),,,('

Complexity of variable elimination

• Suppose in one elimination step we compute

This requires • multiplications

• additions

Complexity is exponential in number of variables in the intermediate factor

i

iYXm )Val()Val(

i

iYX )Val()Val(

Page 52: Variational Methods for Graphical Models

52

Chordal Graphs

• elimination ordering undirected chordal graph

Graph:• Maximal cliques are factors in elimination• Factors in elimination are cliques in the graph• Complexity is exponential in size of the largest

clique in graph

LT

A B

X

V S

D

V S

LT

A B

X D

Page 53: Variational Methods for Graphical Models

53

Induced Width• The size of the largest clique in the induced

graph is thus an indicator for the complexity of variable elimination

• This quantity is called the induced width of a graph according to the specified ordering

• Finding a good ordering for a graph is equivalent to finding the minimal induced width of the graph

Page 54: Variational Methods for Graphical Models

54

Properties of Junction Trees

• In every junction tree:– For each cluster (or sepset) ,

– The probability distribution of any variable , using any cluster (or sepset) that contains

X)(XX P

VX V

}\{

)(V

VPX

X

Page 55: Variational Methods for Graphical Models

55

Exact inference Using Junction Trees

• Undirected tree• Each node is a cluster • Running intersection property:

– Given two clusters and , all clusters on the path between and contain

• Separator sets (sepsets): – Intersection of adjacent clusters

X YXY YX

ADEABD DEFAD DE

Cluster ABDSepset DE

Page 56: Variational Methods for Graphical Models

56

Constructing Junction Trees

Marrying ParentsX4

X6

X5X3

X2

X1

Page 57: Variational Methods for Graphical Models

57

Moral GraphX4

X6

X5X3

X2

X1

Page 58: Variational Methods for Graphical Models

58

TriangulationX4

X6

X5X3

X2

X1

Page 59: Variational Methods for Graphical Models

59

Identify CliquesX4

X6

X5X3

X2

X1

X2X5X6X1X2X3

X2X3X5 X2X4

Page 60: Variational Methods for Graphical Models

60

Junction Tree

• Junction tree is a subgraph of the clique graph satisfying the running intersection property

X1X2X3 X2X5X6X2X3X5X2X3 X2X5

X2

X2X5X6

X2X4

X1X2X3

X2X3X5 X2X4

Page 61: Variational Methods for Graphical Models

61

Constructing Junction Trees

DAG

Moral Graph

Triangulated Graph

Junction Tree

Identify Cliques

Page 62: Variational Methods for Graphical Models

62

Sequential Approach (Example)

• Lower bound for medical diagnosis ex: j j

jj

jj q

zafqzaf )()(

jij

ij

ijijij

j ij

jijiij

ij jiji

ij ijij

fdq

fdq

qd

fq

df

d

i

e

e

e

edfP

)()1()(

)(

)(

0|

0|

|0|

)(0

)( 0

1)|1(