Finding k-best MAP Solutions Using LP Relaxationscnls.lanl.gov/~jasonj/poa/slides/globerson.pdf · 2014-09-24 · Finding k-best MAP Solutions Using LP Relaxations Amir Globerson

Finding k-best MAP Solutions Using LP Relaxations

Amir GlobersonSchool of Computer Science and Engineering

The Hebrew University

Joint Work with: Menachem Fromer (Hebrew Univ.)

Prediction ProblemsConsider the following problem:

Observe variables:

Predict variables: xh

Prediction ProblemsConsider the following problem:

Observe variables:

Predict variables:

Noisy Image Source Image

Received bits Code word

Symptoms Disease

Sentence Derivation

Countless applications:

Images:

Error correcting codes

Medical diagnostics

Visible Hidden

Statistical Models for Prediction

One approach:

Assume (or learn) a model for p(xh,xv)

One approach:

Assume (or learn) a model for

Predict the most likely hidden values

p(xh,xv)

arg maxxh

p(xh|xv)

One approach:

p(xh,xv)

arg maxxh

p(xh|xv)

This conditional distribution often corresponds to a graphical model

One approach:

p(xh,xv)

arg maxxh

p(xh|xv)

This conditional distribution often corresponds to a graphical model

Need to know how to find an assignment with maximum probability

The MAP ProblemGiven a graphical model over

f(x) =!

!ij(xi, xj)

x1, . . . , xn

Find the most likely assignment:

xj!ij(xi, xj)

p(x) =1Z

arg maxx

MAP Approximationsx is discrete so generally NP hard

MAP Approximations

Many approximation approaches:

x is discrete so generally NP hard

MAP Approximations

Greedy search

MAP Approximations

Greedy search

Loopy belief propagation (e.g., max product)

MAP Approximations

Greedy search

Linear programming relaxations

MAP Approximations

Greedy search

MAP Approximations

Greedy search

LP approaches

MAP Approximations

Greedy search

LP approaches

Provide optimality certificates

MAP Approximations

Greedy search

LP approaches

Optimal in some cases (e.g., submodular functions)

MAP Approximations

Greedy search

LP approaches

Optimal in some cases (e.g., submodular functions)

Can be solved via message passing

The k-best MAP Problem

Find the k best assignments for f(x)

Denote these by x(1), . . . ,x(k)

Denote these by

Useful in:

x(1), . . . ,x(k)

Denote these by

Useful in:

Finding multiple candidate solutions when the energy function is not accurate (e.g., protein design)

x(1), . . . ,x(k)

Denote these by

Useful in:

As a first processing stage before applying more complex methods

x(1), . . . ,x(k)

Denote these by

Useful in:

As a first processing stage before applying more complex methods

Supervised learning

x(1), . . . ,x(k)

From 2 to k best

We can show that given a polynomial algorithm for k=2, the problem can be solved for any k in O(k)

Focus on k=2

Our key question: what is the LP formulation of the problem, and its relaxations?

OutlineLP formulation of the MAP problem

LP for 2nd best

General (intractable) exact formulation

Tractable formulation for tree graphs

Approximations for non-tree graphs

Experiments

MAP and LP

MAP and LPMAP: max

MAP and LPMAP:

MAP as LP:

MAP and LPMAP:

MAP as LP:

maxµ!S

µ · !

MAP and LPMAP:

MAP as LP:

maxµ!S

µ · !

MAP and LPMAP:

MAP as LP:

maxµ!S

µ · !

MAP and LPMAP:

MAP as LP:

Approximate MAP via LP

maxµ!S

µ · !

MAP and LPMAP:

MAP as LP:

maxµ!S

µ · !

MAP and LPMAP:

MAP as LP:

Schlesinger, Deza & Laurent, Boros, Wainwright, Kolmogorov

maxµ!S

µ · !

LP Formulation of MAP

LP Formulation of MAPx! = arg max

!ij(xi, xj)

maxq(x)

!ij(xi, xj)=

x! = arg maxx

!ij(xi, xj)

maxq(x)

!ij(xi, xj)=

1q!(x)

xx!x! = arg max

!ij(xi, xj)

maxq(x)

!ij(xi, xj) maxq(x)

qij(xi, xj)!ij(xi, xj)= =

1q!(x)

xx!x! = arg max

!ij(xi, xj)

Objective depends only on pairwise marginals

maxq(x)

!ij(xi, xj) maxq(x)

1q!(x)

xx!x! = arg max

!ij(xi, xj)

But only those that correspond to some distribution

maxq(x)

!ij(xi, xj) maxq(x)

1q!(x)

xx!x! = arg max

!ij(xi, xj)

This set is called the Marginal polytope ( Wainwright & Jordan)

maxq(x)

!ij(xi, xj) maxq(x)

1q!(x)

xx!x! = arg max

!ij(xi, xj)

maxq(x)

!ij(xi, xj) maxq(x)

1q!(x)

xx!x! = arg max

!ij(xi, xj)

!ij(xi, xj) = maxµ!M(G)

µij(xi, xj)!ij(xi, xj)

maxq(x)

!ij(xi, xj) maxq(x)

1q!(x)

xx!x! = arg max

!ij(xi, xj)

µij(xi, xj)!ij(xi, xj)= maxµ!M(G)

µ · !

maxq(x)

!ij(xi, xj) maxq(x)

1q!(x)

xx!x! = arg max

!ij(xi, xj)

See: Cut polytope (Deza, Laurent), Quadric polytope (Boros)

= maxµ!M(G)

µ · !

The Marginal Polytope

Marginal Polytope

M(G)max

µ!M(G)

Marginal Polytope

M(G)µmax

µ!M(G)

Marginal Polytope

M(G)µ

There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)

maxµ!M(G)

Marginal Polytope

M(G)µ

maxµ!M(G)

Difficult set to characterize. Easy to outer bound

Marginal Polytope

M(G)µ

maxµ!M(G)

Difficult set to characterize. Easy to outer bound

The vertices have integral values and correspond to assignments on x

Relaxing the MAP LPmax

Exact but Hard!M(G)

!ij(xi, xj) ! maxµ!S

If optimum is an integral vertex, MAP is solved

Possible outer bound: Pairwise consistencyS

Possible outer bound: Pairwise consistency

µij(xi, xj) =!

µjk(xj , xk)

µij(xi, xj) =!

µjk(xj , xk)Exact for trees

µij(xi, xj) =!

µjk(xj , xk)

Efficient message passing schemes for solving the resulting (dual) LP

Exact for trees

LP for 2nd best

Experiments

The 2nd best problem and LP

MAP 2nd best

f(x)MAP 2nd best

maxx !=x(1)

f(x)maxx

f(x)MAP 2nd best

maxx !=x(1)

f(x)maxx

maxµ!M(G)

µ · !

MAP 2nd best

maxx !=x(1)

f(x)maxx

maxµ!M(G)

µ · ! maxµ!M(G,x(1))

µ · !

MAP 2nd best

maxx !=x(1)

f(x)maxx

maxµ!M(G)

µ · !

MAP 2nd best

Approximations:

maxx !=x(1)

f(x)maxx

maxµ!M(G)

µ · !

MAP 2nd best

Approximations:

maxx !=x(1)

f(x)maxx

maxµ!M(G)

µ · !

MAP 2nd best

Approximations:

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)

M(G, z)

and: p(z) = 0

M(G, z)

and: p(z) = 0

M(G, z)

and: p(z) = 0

M(G, z)

and: p(z) = 0

M(G, z)

LP for the 2nd best problem

The 2nd best problem corresponds to the following LP:

maxx !=x(1)

f(x;!) = maxµ"M(G,x(1))

µ · !

maxx !=x(1)

f(x;!) = maxµ"M(G,x(1))

µ · !

Is there a simple characterization of ? M(G, x(1))

maxx !=x(1)

f(x;!) = maxµ"M(G,x(1))

µ · !

Is it plus one inequality?M(G)

maxx !=x(1)

f(x;!) = maxµ"M(G,x(1))

µ · !

Is it plus one inequality?

If so, what inequality?

LP for 2nd best

Experiments

Adding inequalities to z z

Adding inequalities to Any valid inequality must separate from the other vertices

How about: (Santos 91)!

µi(zi) ! n" 1

How about: (Santos 91)

RHS is n for z and or less for other vertices

µi(zi) ! n" 1

But: Results in fractional vertices, even for trees

µi(zi) ! n" 1

Only an outer bound on

µi(zi) ! n" 1

M(G, z)

The tree case

The tree caseFocus on the case where G is a tree

is given by pairwise consistencyM(G)

is given by pairwise consistency

Define:

I(µ,z) =!

(1! di)µi(zi) +!

µij(zi, zj)

Define:

I(µ,z) =!

(1! di)µi(zi) +!

µij(zi, zj)

H(µ) =!

(1! di)Hi(Xi) +!

H(Xi, Xj)Bethe:

Define:

I(µ,z) =!

(1! di)µi(zi) +!

µij(zi, zj)

Define:

I(µ,z) =!

(1! di)µi(zi) +!

µij(zi, zj)

Theorem:

M(G, z) =!µ | µ !M(G), I(µ,z) " 0

Define:

I(µ,z) =!

(1! di)µi(zi) +!

µij(zi, zj)

Theorem:

M(G, z) =!µ | µ !M(G), I(µ,z) " 0

Define:

I(µ,z) =!

(1! di)µi(zi) +!

µij(zi, zj)

Theorem:

M(G, z) =!µ | µ !M(G), I(µ,z) " 0

I(µ,z) ! 0

Define:

I(µ,z) =!

(1! di)µi(zi) +!

µij(zi, zj)

Theorem:

M(G, z) =!µ | µ !M(G), I(µ,z) " 0

I(µ,z) ! 0

Define:

I(µ,z) =!

(1! di)µi(zi) +!

µij(zi, zj)

Theorem:

M(G, z) =!µ | µ !M(G), I(µ,z) " 0

I(µ,z) ! 0

M(G, z)

Define:

I(µ,z) =!

(1! di)µi(zi) +!

µij(zi, zj)

Theorem:

M(G, z) =!µ | µ !M(G), I(µ,z) " 0

I(µ,z) ! 0

M(G, z)Proof...

ProofA(G, z) =

!µ | µ !M(G), I(µ,z) " 0

"Define:

ProofA(G, z) =

!µ | µ !M(G), I(µ,z) " 0

"Define:

A(G, z) =M(G, z)Want to show:

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ! A(G, z)

A(G, z) =!µ | µ !M(G), I(µ,z) " 0

"Define:

µ ! A(G, z)

A(G, z) =!µ | µ !M(G), I(µ,z) " 0

"Define:

Can construct p(x)

µ ! A(G, z)

A(G, z) =!µ | µ !M(G), I(µ,z) " 0

"Define:

µ ! A(G, z)

F (µ) =

min p(z)s.t. pij(xi, xj) = µij(xi, xj)

pi(xi) = µi(xi)p(x) ! 0

A(G, z) =!µ | µ !M(G), I(µ,z) " 0

"Define:

µ ! A(G, z)

F (µ) =

A(G, z) =!µ | µ !M(G), I(µ,z) " 0

= 0!µ " A(G, z)

Define:

µ ! A(G, z)

F (µ) =

In fact we can show that for trees:

µ !M(G) F (µ) = max{0, I(µ,z)}

A(G, z) =!µ | µ !M(G), I(µ,z) " 0

= 0!µ " A(G, z)

Define:

Proof - key ideas

F (µ) =

Proof - key ideas

F (µ) =

pi(xi) = µi(xi)p(x) ! 0 !x "= z

Proof - key ideas

F (µ) =

pi(xi) = µi(xi)p(x) ! 0 !x "= z

Proof - key ideas

F (µ) =

pi(xi) = µi(xi)p(x) ! 0 !x "= z

Dual: max ! · µs.t.

!ij !ij(xi, xj) +

!i !i(xi) ! 0 "x #= z!

ij !ij(zi,zj) +!

i !i(zi) = 1

Proof - key ideas

F (µ) =

pi(xi) = µi(xi)p(x) ! 0 !x "= z

We show that the value of the above is

I(µ,z)

!ij !ij(xi, xj) +

!i !i(xi) ! 0 "x #= z!

ij !ij(zi,zj) +!

i !i(zi) = 1

Proof - key ideas

F (µ) =

pi(xi) = µi(xi)p(x) ! 0 !x "= z

From there it’s easy to conclude that

I(µ,z)

!ij !ij(xi, xj) +

!i !i(xi) ! 0 "x #= z!

ij !ij(zi,zj) +!

i !i(zi) = 1

Proof - key ideas

F (µ) =

pi(xi) = µi(xi)p(x) ! 0 !x "= z

From there it’s easy to conclude that

F (µ) = max{0, I(µ,z)}

I(µ,z)

!ij !ij(xi, xj) +

!i !i(xi) ! 0 "x #= z!

ij !ij(zi,zj) +!

i !i(zi) = 1

Proof - Max marginalsmax ! · µs.t. !(x) ! 0 "x #= z

!(z) = 1!(x) =

!ij(xi, xj) +!

!i(xi)

Proof - Max marginals

Use max-marginals:

max ! · µs.t. !(x) ! 0 "x #= z

!(z) = 1!(x) =

!ij(xi, xj) +!

!i(xi)

Use max-marginals:

!̄(xi) = maxx̂:x̂i=xi

!̄(xi.xj) = maxx̂:x̂i=xi,x̂j=xj

max ! · µs.t. !(x) ! 0 "x #= z

!(z) = 1!(x) =

!ij(xi, xj) +!

!i(xi)

Use max-marginals:

!(x)!̄(zi) = 1!̄(xi) ! 0 xi "= zi

max ! · µs.t. !(x) ! 0 "x #= z

!(z) = 1!(x) =

!ij(xi, xj) +!

!i(xi)

Use max-marginals:

!(x)!̄(zi) = 1!̄(xi) ! 0 xi "= zi

max ! · µs.t. !(x) ! 0 "x #= z

!(z) = 1!(x) =

!ij(xi, xj) +!

!i(xi)

Rewrite: !(x) =!

(1! di)!̄(xi) +!

!̄ij(xi, xj)

Use max-marginals:

!(x)!̄(zi) = 1!̄(xi) ! 0 xi "= zi

Result follows after some algebra

max ! · µs.t. !(x) ! 0 "x #= z

!(z) = 1!(x) =

!ij(xi, xj) +!

!i(xi)

Rewrite: !(x) =!

(1! di)!̄(xi) +!

!̄ij(xi, xj)

Tree Graph - Summary