Probabilistic Inference - M. Pawan Kumarmpawankumar.info/teaching/inference/lectures2013/lecture5.pdf · Easy Question – BP Compute the reparameterization constants for (a,b) and

Probabilistic Inference Lecture 5

M. Pawan Kumar pawan.kumar@ecp.fr

Slides available online http://cvc.centrale-ponts.fr/personnel/pawan/

•  Open Book – Textbooks – Research Papers – Course Slides – No Electronic Devices

•  Easy Questions – 10 points

•  Hard Questions – 10 points

What to Expect in the Final Exam

Easy Question – BP Compute the reparameterization constants for (a,b) and (c,b) such that the unary potentials of b are equal to its min-marginals.

5 5 -3 Vc

6 12 -6

-2 -1 -4 -3

Hard Question – BP Provide an O(h) algorithm to compute the reparameterization constants of BP for an edge whose pairwise potentials are specified by a truncated linear model.

Easy Question – Minimum Cut Provide the graph corresponding to the MAP estimation problem in the following MRF.

5 5 -3 Vc

6 12 -6

-2 -1 -4 -3

Hard Question – Minimum Cut Show that the expansion algorithm provides a bound of 2M for the truncated linear metric, where M is the value of the truncation.

Easy Question – Relaxations Using an example, show that the LP-S relaxation is not tight for a frustrated cycle (cycle with an odd number of supermodular pairwise potentials).

Hard Question – Relaxations Prove or disprove that the LP-S and SOCP-MS relaxations are invariant to reparameterization.

Integer Programming Formulation

min ∑a ∑i θa;i ya;i + ∑(a,b) ∑ik θab;ik yab;ik

ya;i ∈ {0,1}

∑i ya;i = 1

yab;ik = ya;i yb;k

Integer Programming Formulation

min θTy

ya;i ∈ {0,1}

∑i ya;i = 1

yab;ik = ya;i yb;k

θ = [ … θa;i …. ; … θab;ik ….] y = [ … ya;i …. ; … yab;ik ….]

Linear Programming Relaxation

min θTy

ya;i ∈ {0,1}

∑i ya;i = 1

yab;ik = ya;i yb;k

Two reasons why we can’t solve this

min θTy

ya;i ∈ [0,1]

∑i ya;i = 1

yab;ik = ya;i yb;k

One reason why we can’t solve this

min θTy

ya;i ∈ [0,1]

∑i ya;i = 1

∑k yab;ik = ∑kya;i yb;k

min θTy

ya;i ∈ [0,1]

∑i ya;i = 1

= 1 ∑k yab;ik = ya;i∑k yb;k

min θTy

ya;i ∈ [0,1]

∑i ya;i = 1

∑k yab;ik = ya;i

min θTy

ya;i ∈ [0,1]

∑i ya;i = 1

∑k yab;ik = ya;i

No reason why we can’t solve this * *memory requirements, time complexity

Dual of the LP Relaxation Wainwright et al., 2001

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

θ3 θ4 θ5 θ6

∑ θi = θ

q*(θ1)

∑ θi = θ

q*(θ2)

q*(θ3) q*(θ4) q*(θ5) q*(θ6)

∑ q*(θi) Dual of LP

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi max

q*(θ1)

∑ θi ≡ θ

q*(θ2)

q*(θ3) q*(θ4) q*(θ5) q*(θ6)

Dual of LP

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi ∑ q*(θi) max

∑ θi ≡ θ

max ∑ q*(θi)

I can easily compute q*(θi)

I can easily maintain reparam constraint

So can I easily solve the dual?

•  TRW Message Passing

•  Dual Decomposition

Outline

Things to Remember

•  Forward-pass computes min-marginals of root

•  BP is exact for trees

•  Every iteration provides a reparameterization

TRW Message Passing Kolmogorov, 2006

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

θ4 θ5 θ6

∑ θi ≡ θ ∑ q*(θi)

Pick a variable Va

∑ θi ≡ θ ∑ q*(θi)

Vc Vb Va

θ1c;0

θ1c;1

θ1b;0

θ1b;1

θ1a;0

θ1a;1

Va Vd Vg

θ4a;0

θ4a;1

θ4d;0

θ4d;1

θ4g;0

θ4g;1

θ1 + θ4 + θrest ≡ θ q*(θ1) + q*(θ4) + K

Vc Vb Va Va Vd Vg

Reparameterize to obtain min-marginals of Va

θ1c;0

θ1c;1

θ1b;0

θ1b;1

θ1a;0

θ1a;1

θ4a;0

θ4a;1

θ4d;0

θ4d;1

θ4g;0

θ4g;1

θ’1 + θ’4 + θrest

Vc Vb Va

θ’1c;0

θ’1c;1

θ’1b;0

θ’1b;1

θ’1a;0

θ’1a;1

Va Vd Vg

θ’4a;0

θ’4a;1

θ’4d;0

θ’4d;1

θ’4g;0

θ’4g;1

One pass of Belief Propagation

q*(θ’1) + q*(θ’4) + K

θ’1 + θ’4 + θrest ≡ θ

Vc Vb Va Va Vd Vg

Remain the same

q*(θ’1) + q*(θ’4) + K

θ’1c;0

θ’1c;1

θ’1b;0

θ’1b;1

θ’1a;0

θ’1a;1

θ’4a;0

θ’4a;1

θ’4d;0

θ’4d;1

θ’4g;0

θ’4g;1

θ’1 + θ’4 + θrest ≡ θ

min{θ’1a;0,θ’1

a;1} + min{θ’4a;0,θ’4

a;1} + K

Vc Vb Va Va Vd Vg

θ’1c;0

θ’1c;1

θ’1b;0

θ’1b;1

θ’1a;0

θ’1a;1

θ’4a;0

θ’4a;1

θ’4d;0

θ’4d;1

θ’4g;0

θ’4g;1

θ’1 + θ’4 + θrest ≡ θ

Vc Vb Va Va Vd Vg

Compute average of min-marginals of Va

θ’1c;0

θ’1c;1

θ’1b;0

θ’1b;1

θ’1a;0

θ’1a;1

θ’4a;0

θ’4a;1

θ’4d;0

θ’4d;1

θ’4g;0

θ’4g;1

a;1} + min{θ’4a;0,θ’4

a;1} + K

θ’1 + θ’4 + θrest ≡ θ

Vc Vb Va Va Vd Vg

θ’’a;0 = θ’1a;0+ θ’4

θ’’a;1 = θ’1a;1+ θ’4

θ’1c;0

θ’1c;1

θ’1b;0

θ’1b;1

θ’1a;0

θ’1a;1

θ’4a;0

θ’4a;1

θ’4d;0

θ’4d;1

θ’4g;0

θ’4g;1

a;1} + min{θ’4a;0,θ’4

a;1} + K

θ’’1 + θ’’4 + θrest

Vc Vb Va Va Vd Vg

θ’1c;0

θ’1c;1

θ’1b;0

θ’1b;1

θ’’a;0

θ’’a;1

θ’’a;0

θ’’a;1

θ’4d;0

θ’4d;1

θ’4g;0

θ’4g;1

θ’’a;0 = θ’1a;0+ θ’4

θ’’a;1 = θ’1a;1+ θ’4

a;1} + min{θ’4a;0,θ’4

a;1} + K

θ’’1 + θ’’4 + θrest ≡ θ

Vc Vb Va Va Vd Vg

θ’1c;0

θ’1c;1

θ’1b;0

θ’1b;1

θ’’a;0

θ’’a;1

θ’’a;0

θ’’a;1

θ’4d;0

θ’4d;1

θ’4g;0

θ’4g;1

θ’’a;0 = θ’1a;0+ θ’4

θ’’a;1 = θ’1a;1+ θ’4

a;1} + min{θ’4a;0,θ’4

a;1} + K

Vc Vb Va Va Vd Vg

2 min{θ’’a;0, θ’’a;1} + K

θ’1c;0

θ’1c;1

θ’1b;0

θ’1b;1

θ’’a;0

θ’’a;1

θ’’a;0

θ’’a;1

θ’4d;0

θ’4d;1

θ’4g;0

θ’4g;1

θ’’1 + θ’’4 + θrest ≡ θ

θ’’a;0 = θ’1a;0+ θ’4

θ’’a;1 = θ’1a;1+ θ’4

Vc Vb Va Va Vd Vg

θ’1c;0

θ’1c;1

θ’1b;0

θ’1b;1

θ’’a;0

θ’’a;1

θ’’a;0

θ’’a;1

θ’4d;0

θ’4d;1

θ’4g;0

θ’4g;1

min {p1+p2, q1+q2} min {p1, q1} + min {p2, q2} ≥ 2 min{θ’’a;0, θ’’a;1} + K

θ’’1 + θ’’4 + θrest ≡ θ

Vc Vb Va Va Vd Vg

Objective function increases or remains constant

θ’1c;0

θ’1c;1

θ’1b;0

θ’1b;1

θ’’a;0

θ’’a;1

θ’’a;0

θ’’a;1

θ’4d;0

θ’4d;1

θ’4g;0

θ’4g;1

2 min{θ’’a;0, θ’’a;1} + K

θ’’1 + θ’’4 + θrest ≡ θ

TRW Message Passing

Initialize θi. Take care of reparam constraint

Choose random variable Va

Compute min-marginals of Va for all trees

Node-average the min-marginals

REPEAT

Kolmogorov, 2006

Can also do edge-averaging

Example 1

3 Vc Va

Pick variable Va. Reparameterize.

Example 1

2 Vb Vc

3 Vc Va

Average the min-marginals of Va

Example 1

2 Vb Vc

3 Vc Va

Pick variable Vb. Reparameterize.

Example 1

-7 -5.5

7 Vb Vc

3 Vc Va

Average the min-marginals of Vb

Example 1

-7 -5.5

6.5 Vb Vc

3 Vc Va

6.5 6.5 7 Value of dual does not increase

Example 1

-7 -5.5

6.5 Vb Vc

3 Vc Va

6.5 6.5 7 Maybe it will increase for Vc

Example 1

-7 -5.5

6.5 Vb Vc

3 Vc Va

Strong Tree Agreement

Exact MAP Estimate

f1(a) = 0 f1(b) = 0 f2(b) = 0 f2(c) = 0 f3(c) = 0 f3(a) = 0

Example 2

2 Vb Vc

0 Vc Va

Pick variable Va. Reparameterize.

Example 2

2 Vb Vc

0 Vc Va

Average the min-marginals of Va

Example 2

2 Vb Vc

0 Vc Va

4 0 4 Value of dual does not increase

Example 2

2 Vb Vc

0 Vc Va

4 0 4 Maybe it will decrease for Vb or Vc

Example 2

2 Vb Vc

0 Vc Va

f1(a) = 1 f1(b) = 1 f2(b) = 1 f2(c) = 0 f3(c) = 1 f3(a) = 1

f2(b) = 0 f2(c) = 1 Weak Tree Agreement

Not Exact MAP Estimate

Example 2

2 Vb Vc

0 Vc Va

Weak Tree Agreement Convergence point of TRW

f1(a) = 1 f1(b) = 1 f2(b) = 1 f2(c) = 0 f3(c) = 1 f3(a) = 1

f2(b) = 0 f2(c) = 1

Obtaining the Labelling

Only solves the dual. Primal solutions?

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

θ’ = ∑ θi ≡ θ

Fix the label Of Va

Obtaining the Labelling

Only solves the dual. Primal solutions?

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

θ’ = ∑ θi ≡ θ

Fix the label Of Vb

Continue in some fixed order Meltzer et al., 2006

Computational Issues of TRW

•  Speed-ups for some pairwise potentials

Basic Component is Belief Propagation

Felzenszwalb & Huttenlocher, 2004

•  Memory requirements cut down by half Kolmogorov, 2006

•  Further speed-ups using monotonic chains Kolmogorov, 2006

Theoretical Properties of TRW

•  Always converges, unlike BP Kolmogorov, 2006

•  Strong tree agreement implies exact MAP Wainwright et al., 2001

•  Optimal MAP for two-label submodular problems

Kolmogorov and Wainwright, 2005

θab;00 + θab;11 ≤ θab;01 + θab;10

Results Binary Segmentation Szeliski et al. , 2008

Labels - {foreground, background}

Unary Potentials: -log(likelihood) using learnt fg/bg models

Pairwise Potentials: 0, if same labels 1 - λexp(|da - db|), if different labels

Results Binary Segmentation

Szeliski et al. , 2008

Results Binary Segmentation

Szeliski et al. , 2008

Belief Propagation

Results Stereo Correspondence Szeliski et al. , 2008

Labels - {disparities}

Unary Potentials: Similarity of pixel colours

Results Szeliski et al. , 2008

Stereo Correspondence

Results Szeliski et al. , 2008

Belief Propagation

Stereo Correspondence

Results Non-submodular problems Kolmogorov, 2006

BP TRW-S

30x30 grid K50

BP TRW-S

BP outperforms TRW-S

Code + Standard Data

http://vision.middlebury.edu/MRF

•  TRW Message Passing

•  Dual Decomposition

Outline

Dual Decomposition

minx ∑i gi(x) s.t. x ∈ C

Dual Decomposition

minx,xi ∑i gi(xi)

s.t. xi ∈ C xi = x

Dual Decomposition

minx,xi ∑i gi(xi)

s.t. xi ∈ C

Dual Decomposition

minx,xi ∑i gi(xi) + ∑i λi

T(xi-x) s.t. xi ∈ C

maxλi

KKT Condition: ∑i λi = 0

Dual Decomposition

minx,xi ∑i gi(xi) + ∑i λi

Txi s.t. xi ∈ C

maxλi

Dual Decomposition

minxi ∑i (gi(xi) + λi

Txi) s.t. xi ∈ C

Projected Supergradient Ascent

maxλi

Supergradient s of h(z) at z0

h(z) - h(z0) ≤ sT(z-z0), for all z in the feasible region

Dual Decomposition

Txi) s.t. xi ∈ C

Initialize λi0

maxλi

Dual Decomposition

Txi) s.t. xi ∈ C

Compute supergradients

maxλi

si = argminxi ∑i (gi(xi) + (λi

t)Txi)

Dual Decomposition

Txi) s.t. xi ∈ C

Project supergradients

maxλi

pi = si - ∑j sj/m

where ‘m’ = number of subproblems (slaves)

Dual Decomposition

Txi) s.t. xi ∈ C

Update dual variables

maxλi

λit+1

= λit + ηt pi

where ηt = learning rate = 1/(t+1) for example

Dual Decomposition Initialize λi

Compute projected supergradients

si = argminxi ∑i (gi(xi) + (λi

t)Txi)

pi = si - ∑j sj/m

Update dual variables

λit+1

= λit + ηt pi

REPEAT

Dual Decomposition Komodakis et al., 2007

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

θ4 θ5 θ6

Slaves agree on label for Va

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

θ4 θ5 θ6

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

θ4 θ5 θ6

Slaves disagree on label for Va

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

θ4 θ5 θ6

-0.5 p1

0.5 p4

Unary cost increases

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

θ4 θ5 θ6

-0.5 p1

0.5 p4

Unary cost decreases

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

θ4 θ5 θ6

-0.5 p1

0.5 p4

Push the slaves towards agreement

Comparison TRW DD

Fast Slow

Local Maximum Global Maximum

Requires Min-Marginals

Requires MAP Estimate

Other forms of slaves Tighter relaxations

Sparse high-order potentials

Probabilistic Inference - M. Pawan Kumarmpawankumar.info/teaching/inference/lectures2013/lecture5.pdf · Easy Question – BP Compute the reparameterization constants for (a,b) and

Documents

A POD/PGD reduction approach for an efficient...

PartIIDynamicalSystemsMichaelmasTerm2013damtp.cam.ac.uk/user...

Semaphores -...

Light Field Editing Based on Reparameterization · Light...

Implicit Reparameterization Gradients · 2019-02-19 ·...

Parametric Inference Maximum Likelihood Inference...

(DL hacks輪読) Variational Dropout and the Local...

3D графика и трасиране на лъчи...

Low precision Inference on GPU - Nvidia · 3 INFERENCE •....

Direct Inference and Inverse Inference Teddy … Inference.....

A SAT-Based Algorithm for Reparameterization in Symbolic...

Composing B6zier Simplexes -...

INDUCTION (probable inference) : inference moving from...

Differentiable Feature Selection, a Reparameterization ...

Introduction to Statistical...

INTELIGENŢĂ ARTIFICIALĂ -...