Top Banner
Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty
60

Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Mar 26, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Expectation Propagation in Practice

Tom Minka

CMU StatisticsJoint work with Yuan Qi and John Lafferty

Page 2: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Outline

• EP algorithm

• Examples:– Tracking a dynamic system– Signal detection in fading channels– Document modeling– Boltzmann machines

Page 3: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Extensions to EP

• Alternatives to moment-matching

• Factors raised to powers

• Skipping factors

Page 4: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

EP in a nutshell

• Approximate a function by a simpler one:

• Where each lives in a parametric, exponential family (e.g. Gaussian)

• Factors can be conditional distributions in a Bayesian network

a

afp )()( xx a

afq )(~

)( xx

)(~xaf

)(xaf

Page 5: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

EP algorithm

• Iterate the fixed-point equations:

• specifies where the approximation needs to be good

• Coordinated local approximations

))()(~

||)()((minarg)(~ \\ xxxxx a

aa

aa qfqfDf

ab

ba fq )(

~)(\ xxwhere

)(\ xaq

af~

Page 6: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

(Loopy) Belief propagation

• Specialize to factorized approximations:

• Minimize KL-divergence = match marginals of (partially factorized) and (fully factorized)– “send messages”

i

iaia xff )(~

)(~x “messages”

)()( \ xx aa qf

)()(~ \ xx aa qf

Page 7: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

EP versus BP

• EP approximation can be in a restricted family, e.g. Gaussian

• EP approximation does not have to be factorized

• EP applies to many more problems– e.g. mixture of discrete/continuous variables

Page 8: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

EP versus Monte Carlo

• Monte Carlo is general but expensive– A sledgehammer

• EP exploits underlying simplicity of the problem (if it exists)

• Monte Carlo is still needed for complex problems (e.g. large isolated peaks)

• Trick is to know what problem you have

Page 9: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Example: Tracking

Guess the position of an object given noisy measurements

1y

4y

Object

1x2x

3x

4x

2y

3y

Page 10: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Bayesian network

1y 2y 3y 4y

1x 2x 3x 4x

ttt νxx 1

noise tt xy

(random walk)e.g.

want distribution of x’s given y’s

Page 11: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Terminology

• Filtering: posterior for last state only

• Smoothing: posterior for middle states

• On-line: old data is discarded (fixed memory)

• Off-line: old data is re-used (unbounded memory)

Page 12: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Kalman filtering / Belief propagation

• Prediction:

• Measurement:

• Smoothing:

111 )|()|()|( ttttttt dxyxpxxpyxp

)|()|(),|( ttttttt yxpxypyyxp

11111 ),|()|()|(),|( ttttttttttt dxyyxpxxpyxpyyxp

Page 13: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Approximation

1

1111 )|()|()|()(),(t

tttt xypxxpxypxpp yx

1

111111 )(~)(~)(~)(~)()(t

tttttttt xoxpxpxoxpq x

Factorized and Gaussian in x

Page 14: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Approximation

)(~)(~)(~)( 11 tttttttt xpxoxpxq

= (forward msg)(observation)(backward msg)

EP equations are exactly the prediction, measurement, and smoothing equations for the Kalman filter- but only preserve first and second moments

Consider case of linear dynamics…

Page 15: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

EP in dynamic systems

• Loop t = 1, …, T (filtering)– Prediction step– Approximate measurement step

• Loop t = T, …, 1 (smoothing)– Smoothing step– Divide out the approximate measurement– Re-approximate the measurement

• Loop t = 1, …, T (re-filtering)– Prediction and measurement using previous approx

Page 16: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Generalization

• Instead of matching moments, can use any method for approximate filtering

• E.g. Extended Kalman filter, statistical linearization, unscented filter, etc.

• All can be interpreted as finding linear/Gaussian approx to original terms

Page 17: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Interpreting EP

• After more information is available, re-approximate individual terms for better results

• Optimal filtering is no longer on-line

Page 18: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Example: Poisson tracking

• is an integer valued Poisson variate with mean )exp( txty

Page 19: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Poisson tracking model

)01.0,(~)|( 11 ttt xNxxp

)100,0(~)( 1 Nxp

!/)exp()|( tx

tttt yexyxyp t

Page 20: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Approximate measurement step

• is not Gaussian

• Moments of x not analytic

• Two approaches:– Gauss-Hermite quadrature for moments– Statistical linearization instead of moment-

matching

• Both work well

)|()|( tttt yxpxyp

Page 21: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.
Page 22: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Posterior for the last state

Page 23: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.
Page 24: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.
Page 25: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

EP for signal detection

• Wireless communication problem

• Transmitted signal =

• vary to encode each symbol

• In complex numbers:

)sin( ta

iae

Re

Im

),( a

a

Page 26: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Binary symbols, Gaussian noise

• Symbols are 1 and –1 (in complex plane)

• Received signal =

• Recovered

• Optimal detection is easy

ty

noise)sin( ta

tyaeea noiseˆˆ

0s 1s

Page 27: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Fading channel

• Channel systematically changes amplitude and phase:

• changes over time

noise sxy tt

tx

ty

0sxt

1sxt

Page 28: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Differential detection

• Use last measurement to estimate state

• Binary symbols only

• No smoothing of state = noisy

ty

1 ty

1ty

Page 29: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Bayesian network

1y 2y 3y 4y

1x 2x 3x 4x

Dynamics are learned from training data (all 1’s)

1s 2s 3s 4s

Symbols can also be correlated (e.g. error-correcting code)

Page 30: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

On-line implementation

• Iterate over the last measurements

• Previous measurements act as prior

• Results comparable to particle filtering, but much faster

Page 31: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.
Page 32: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Document modeling

• Want to classify documents by semantic content

• Word order generally found to be irrelevant– Word choice is what matters

• Model each document as a bag of words– Reduces to modeling correlations between

word probabilities

Page 33: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Generative aspect model

Each document mixes aspects in different proportions

Aspect 1 Aspect 2

111 2

321 31

(Hofmann 1999; Blei, Ng, & Jordan 2001)

Page 34: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Generative aspect model

Document

Aspect 1 Aspect 2

) word( wp

1

),...,(~)( 1 JDirichletp

Multinomial sampling

Page 35: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Two tasks

Inference:

• Given aspects and document i, what is (posterior for) ?

Learning:

• Given some documents, what are (maximum likelihood) aspects?

i

Page 36: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Approximation

• Likelihood is composed of terms of form

• Want Dirichlet approximation:

a

na

nnw

www awpwpt ))|(()()(

a

awwat )(~

Page 37: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

EP with powers

• These terms seem too complicated for EP

• Can match moments if , but not for large

• Solution: match moments of one occurrence at a time– Redefine what are the “terms”

1wn

wn

Page 38: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

EP with powers

• Moment match:

• Context function: all but one occurrence

• Fixed point equations for

ww

nw

nw

w ww ttq'

'1\ ')(~)(~)(

)()(~)()( \\ ww

ww qtqt

Page 39: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

EP with skipping

• Context fcn might not be a proper density

• Solution: “skip” this term– (keep old approximation)

• In later iterations, context becomes proper

Page 40: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Another problem

• Minimizing KL-divergence of Dirichlet is expensive– Requires iteration

• Match (mean,variance) instead– Closed-form

Page 41: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

One term

3.0)1(4.0)()( wt

VB = Variational Bayes (Blei et al)

Page 42: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Ten word document

Page 43: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

General behavior

• For long documents, VB recovers correct mean, but not correct variance of

• Disastrous for learning– No Occam factor

• Gets worse with more documents– No asymptotic salvation

• EP gets correct variance, learns properly

Page 44: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Learning in probability simplex

100 docs,Length 10

Page 45: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Learning in probability simplex

10 docs,Length 10

Page 46: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Learning in probability simplex

10 docs,Length 10

Page 47: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Learning in probability simplex

10 docs,Length 10

Page 48: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Boltzmann machines

4x

2x 3x

1x

a

afp )()( xx a

afq )(~

)( xx

Joint distribution is product of pair potentials:

Want to approximate by a simpler distribution

Page 49: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Approximations

4x

2x 3x

1x

4x

2x 3x

4x

2x 3x

1x 1xBP EP

Page 50: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Approximating an edge by a tree

24

443

3442

2441

14

21)(

~),(

~),(

~),(

~),(

xf

xxfxxfxxfxxf

a

aaaa

Each potential in p is projected onto the tree-structure of q

Correlations are not lost, but projected onto the tree

Page 51: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Fixed-point equations

• Match single and pairwise marginals of

• Reduces to exact inference on single loops– Use cutset conditioning

4x

2x 3x

1x

4x

2x 3x

1x

and

Page 52: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

5-node complete graphs, 10 trials

Method FLOPS Error

Exact 500 0

TreeEP 3,000 0.032

BP/double-loop 200,000 0.186

GBP 360,000 0.211

Page 53: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

8x8 grids, 10 trials

Method FLOPS Error

Exact 30,000 0

TreeEP 300,000 0.149

BP/double-loop 15,500,000 0.358

GBP 17,500,000 0.003

Page 54: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

TreeEP versus BP

• TreeEP always more accurate than BP, often faster

• GBP slower than BP, not always more accurate

• TreeEP converges more often than BP and GBP

Page 55: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Conclusions

• EP algorithms exceed state-of-art in several domains

• Many more opportunities out there• EP is sensitive to choice of approximation

– does not give guidance in choosing it (e.g. tree structure) – error bound?

• Exponential family constraint can be limiting – mixtures?

Page 56: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

End

Page 57: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Limitation of BP

• If the dynamics or measurements are not linear and Gaussian, the complexity of the posterior increases with the number of measurements

• I.e. BP equations are not “closed”– Beliefs need not stay within a given family

* or any other exponential family

*

Page 58: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

Approximate filtering

• Compute a Gaussian belief which approximates the true posterior:

• E.g. Extended Kalman filter, statistical linearization, unscented filter, assumed-density filter

)|()( ttt yxpxq

Page 59: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

EP perspective

• Approximate filtering is equivalent to replacing true measurement/dynamics equations with linear/Gaussian equations

)|(

),|()|(

tt

ttttt yxp

yyxpxyp

)|()|(),|( ttttttt yxpxypyyxp

implies

Gaussian

Gaussian

Page 60: Expectation Propagation in Practice Tom Minka CMU Statistics Joint work with Yuan Qi and John Lafferty.

EP perspective

• EKF, UKF, ADF are all algorithms for:

)|( tt xyp

)|( 1tt xxp

Nonlinear,Non-Gaussian

Linear,Gaussian

)|(~tt xyp

)|(~1tt xxp