Finding k-best MAP Solutions Using LP Relaxationscnls.lanl.gov/~jasonj/poa/slides/globerson.pdf · 2014-09-24 · Finding k-best MAP Solutions Using LP Relaxations Amir Globerson

Post on 14-Mar-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Finding k-best MAP Solutions Using LP Relaxations

Amir GlobersonSchool of Computer Science and Engineering

The Hebrew University

Joint Work with: Menachem Fromer (Hebrew Univ.)

Prediction ProblemsConsider the following problem:

Observe variables:

Predict variables: xh

xv

Prediction ProblemsConsider the following problem:

Observe variables:

Predict variables:

Noisy Image Source Image

Received bits Code word

Symptoms Disease

Sentence Derivation

Countless applications:

Images:

Error correcting codes

Medical diagnostics

Text

Visible Hidden

xh

xv

Statistical Models for Prediction

Statistical Models for Prediction

One approach:

Statistical Models for Prediction

One approach:

Assume (or learn) a model for p(xh,xv)

Statistical Models for Prediction

One approach:

Assume (or learn) a model for

Predict the most likely hidden values

p(xh,xv)

arg maxxh

p(xh|xv)

Statistical Models for Prediction

One approach:

Assume (or learn) a model for

Predict the most likely hidden values

p(xh,xv)

arg maxxh

p(xh|xv)

This conditional distribution often corresponds to a graphical model

Statistical Models for Prediction

One approach:

Assume (or learn) a model for

Predict the most likely hidden values

p(xh,xv)

arg maxxh

p(xh|xv)

This conditional distribution often corresponds to a graphical model

Need to know how to find an assignment with maximum probability

The MAP ProblemGiven a graphical model over

f(x) =!

ij

!ij(xi, xj)

x1, . . . , xn

Find the most likely assignment:

xi

xj!ij(xi, xj)

p(x) =1Z

ef(x)

arg maxx

f(x)

MAP Approximationsx is discrete so generally NP hard

MAP Approximations

Many approximation approaches:

x is discrete so generally NP hard

MAP Approximations

Many approximation approaches:

Greedy search

x is discrete so generally NP hard

MAP Approximations

Many approximation approaches:

Greedy search

Loopy belief propagation (e.g., max product)

x is discrete so generally NP hard

MAP Approximations

Many approximation approaches:

Greedy search

Loopy belief propagation (e.g., max product)

Linear programming relaxations

x is discrete so generally NP hard

MAP Approximations

Many approximation approaches:

Greedy search

Loopy belief propagation (e.g., max product)

Linear programming relaxations

x is discrete so generally NP hard

MAP Approximations

Many approximation approaches:

Greedy search

Loopy belief propagation (e.g., max product)

Linear programming relaxations

LP approaches

x is discrete so generally NP hard

MAP Approximations

Many approximation approaches:

Greedy search

Loopy belief propagation (e.g., max product)

Linear programming relaxations

LP approaches

Provide optimality certificates

x is discrete so generally NP hard

MAP Approximations

Many approximation approaches:

Greedy search

Loopy belief propagation (e.g., max product)

Linear programming relaxations

LP approaches

Provide optimality certificates

Optimal in some cases (e.g., submodular functions)

x is discrete so generally NP hard

MAP Approximations

Many approximation approaches:

Greedy search

Loopy belief propagation (e.g., max product)

Linear programming relaxations

LP approaches

Provide optimality certificates

Optimal in some cases (e.g., submodular functions)

Can be solved via message passing

x is discrete so generally NP hard

The k-best MAP Problem

The k-best MAP Problem

Find the k best assignments for f(x)

The k-best MAP Problem

Find the k best assignments for f(x)

Denote these by x(1), . . . ,x(k)

The k-best MAP Problem

Find the k best assignments for f(x)

Denote these by

Useful in:

x(1), . . . ,x(k)

The k-best MAP Problem

Find the k best assignments for f(x)

Denote these by

Useful in:

Finding multiple candidate solutions when the energy function is not accurate (e.g., protein design)

x(1), . . . ,x(k)

The k-best MAP Problem

Find the k best assignments for f(x)

Denote these by

Useful in:

Finding multiple candidate solutions when the energy function is not accurate (e.g., protein design)

As a first processing stage before applying more complex methods

x(1), . . . ,x(k)

The k-best MAP Problem

Find the k best assignments for f(x)

Denote these by

Useful in:

Finding multiple candidate solutions when the energy function is not accurate (e.g., protein design)

As a first processing stage before applying more complex methods

Supervised learning

x(1), . . . ,x(k)

From 2 to k best

We can show that given a polynomial algorithm for k=2, the problem can be solved for any k in O(k)

Focus on k=2

Our key question: what is the LP formulation of the problem, and its relaxations?

OutlineLP formulation of the MAP problem

LP for 2nd best

General (intractable) exact formulation

Tractable formulation for tree graphs

Approximations for non-tree graphs

Experiments

MAP and LP

MAP and LPMAP: max

xf(x)

MAP and LPMAP:

MAP as LP:

maxx

f(x)

MAP and LPMAP:

MAP as LP:

maxx

f(x)

maxµ!S

µ · !

MAP and LPMAP:

MAP as LP:

S

maxx

f(x)

maxµ!S

µ · !

MAP and LPMAP:

MAP as LP:

S

Hard

maxx

f(x)

maxµ!S

µ · !

MAP and LPMAP:

MAP as LP:

S

Hard

Approximate MAP via LP

maxx

f(x)

maxµ!S

µ · !

MAP and LPMAP:

MAP as LP:

S

Hard

Approximate MAP via LP

maxx

f(x)

maxµ!S

µ · !

MAP and LPMAP:

MAP as LP:

S

Hard

Approximate MAP via LP

maxx

f(x)

Schlesinger, Deza & Laurent, Boros, Wainwright, Kolmogorov

maxµ!S

µ · !

LP Formulation of MAP

LP Formulation of MAPx! = arg max

x

!

ij"E

!ij(xi, xj)

LP Formulation of MAP

maxq(x)

!

x

q(x)!

ij

!ij(xi, xj)=

x! = arg maxx

!

ij"E

!ij(xi, xj)

LP Formulation of MAP

maxq(x)

!

x

q(x)!

ij

!ij(xi, xj)=

0

1q!(x)

xx!x! = arg max

x

!

ij"E

!ij(xi, xj)

LP Formulation of MAP

maxq(x)

!

x

q(x)!

ij

!ij(xi, xj) maxq(x)

!

ij

!

xi,xj

qij(xi, xj)!ij(xi, xj)= =

0

1q!(x)

xx!x! = arg max

x

!

ij"E

!ij(xi, xj)

LP Formulation of MAP

Objective depends only on pairwise marginals

maxq(x)

!

x

q(x)!

ij

!ij(xi, xj) maxq(x)

!

ij

!

xi,xj

qij(xi, xj)!ij(xi, xj)= =

0

1q!(x)

xx!x! = arg max

x

!

ij"E

!ij(xi, xj)

LP Formulation of MAP

Objective depends only on pairwise marginals

But only those that correspond to some distribution

maxq(x)

!

x

q(x)!

ij

!ij(xi, xj) maxq(x)

!

ij

!

xi,xj

qij(xi, xj)!ij(xi, xj)= =

0

1q!(x)

xx!x! = arg max

x

!

ij"E

!ij(xi, xj)

q(x)

LP Formulation of MAP

Objective depends only on pairwise marginals

But only those that correspond to some distribution

This set is called the Marginal polytope ( Wainwright & Jordan)

maxq(x)

!

x

q(x)!

ij

!ij(xi, xj) maxq(x)

!

ij

!

xi,xj

qij(xi, xj)!ij(xi, xj)= =

0

1q!(x)

xx!x! = arg max

x

!

ij"E

!ij(xi, xj)

q(x)

LP Formulation of MAP

Objective depends only on pairwise marginals

But only those that correspond to some distribution

This set is called the Marginal polytope ( Wainwright & Jordan)

maxq(x)

!

x

q(x)!

ij

!ij(xi, xj) maxq(x)

!

ij

!

xi,xj

qij(xi, xj)!ij(xi, xj)= =

0

1q!(x)

xx!x! = arg max

x

!

ij"E

!ij(xi, xj)

q(x)

maxx

!

ij

!ij(xi, xj) = maxµ!M(G)

!

ij

µij(xi, xj)!ij(xi, xj)

LP Formulation of MAP

Objective depends only on pairwise marginals

But only those that correspond to some distribution

This set is called the Marginal polytope ( Wainwright & Jordan)

maxq(x)

!

x

q(x)!

ij

!ij(xi, xj) maxq(x)

!

ij

!

xi,xj

qij(xi, xj)!ij(xi, xj)= =

0

1q!(x)

xx!x! = arg max

x

!

ij"E

!ij(xi, xj)

q(x)

maxx

!

ij

!ij(xi, xj) = maxµ!M(G)

!

ij

µij(xi, xj)!ij(xi, xj)= maxµ!M(G)

µ · !

LP Formulation of MAP

Objective depends only on pairwise marginals

But only those that correspond to some distribution

This set is called the Marginal polytope ( Wainwright & Jordan)

maxq(x)

!

x

q(x)!

ij

!ij(xi, xj) maxq(x)

!

ij

!

xi,xj

qij(xi, xj)!ij(xi, xj)= =

0

1q!(x)

xx!x! = arg max

x

!

ij"E

!ij(xi, xj)

q(x)

maxx

!

ij

!ij(xi, xj) = maxµ!M(G)

!

ij

µij(xi, xj)!ij(xi, xj)

See: Cut polytope (Deza, Laurent), Quadric polytope (Boros)

= maxµ!M(G)

µ · !

The Marginal Polytope

Marginal Polytope

M(G)max

µ!M(G)

!

ij!E

!

xi,xj

µij(xi, xj)!ij(xi, xj)

The Marginal Polytope

Marginal Polytope

M(G)µmax

µ!M(G)

!

ij!E

!

xi,xj

µij(xi, xj)!ij(xi, xj)

The Marginal Polytope

Marginal Polytope

M(G)µ

There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)

maxµ!M(G)

!

ij!E

!

xi,xj

µij(xi, xj)!ij(xi, xj)

The Marginal Polytope

Marginal Polytope

M(G)µ

There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)

maxµ!M(G)

!

ij!E

!

xi,xj

µij(xi, xj)!ij(xi, xj)

Difficult set to characterize. Easy to outer bound

The Marginal Polytope

Marginal Polytope

M(G)µ

There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)

maxµ!M(G)

!

ij!E

!

xi,xj

µij(xi, xj)!ij(xi, xj)

Difficult set to characterize. Easy to outer bound

The vertices have integral values and correspond to assignments on x

Relaxing the MAP LPmax

x

!

ij

!ij(xi, xj) = maxµ!M(G)

!

ij!E

!

xi,xj

µij(xi, xj)!ij(xi, xj)

M(G)

Relaxing the MAP LPmax

x

!

ij

!ij(xi, xj) = maxµ!M(G)

!

ij!E

!

xi,xj

µij(xi, xj)!ij(xi, xj)

Exact but Hard!M(G)

Relaxing the MAP LPmax

x

!

ij

!ij(xi, xj) ! maxµ!S

!

ij!E

!

xi,xj

µij(xi, xj)!ij(xi, xj)

S

M(G)

Relaxing the MAP LPmax

x

!

ij

!ij(xi, xj) ! maxµ!S

!

ij!E

!

xi,xj

µij(xi, xj)!ij(xi, xj)

If optimum is an integral vertex, MAP is solved

S

M(G)

Relaxing the MAP LPmax

x

!

ij

!ij(xi, xj) ! maxµ!S

!

ij!E

!

xi,xj

µij(xi, xj)!ij(xi, xj)

If optimum is an integral vertex, MAP is solved

Possible outer bound: Pairwise consistencyS

M(G)

Relaxing the MAP LPmax

x

!

ij

!ij(xi, xj) ! maxµ!S

!

ij!E

!

xi,xj

µij(xi, xj)!ij(xi, xj)

If optimum is an integral vertex, MAP is solved

Possible outer bound: Pairwise consistency

j!

i!

k! !

xi

µij(xi, xj) =!

xk

µjk(xj , xk)

S

M(G)

Relaxing the MAP LPmax

x

!

ij

!ij(xi, xj) ! maxµ!S

!

ij!E

!

xi,xj

µij(xi, xj)!ij(xi, xj)

If optimum is an integral vertex, MAP is solved

Possible outer bound: Pairwise consistency

j!

i!

k! !

xi

µij(xi, xj) =!

xk

µjk(xj , xk)Exact for trees

S

M(G)

Relaxing the MAP LPmax

x

!

ij

!ij(xi, xj) ! maxµ!S

!

ij!E

!

xi,xj

µij(xi, xj)!ij(xi, xj)

If optimum is an integral vertex, MAP is solved

Possible outer bound: Pairwise consistency

j!

i!

k! !

xi

µij(xi, xj) =!

xk

µjk(xj , xk)

Efficient message passing schemes for solving the resulting (dual) LP

Exact for trees

S

M(G)

OutlineLP formulation of the MAP problem

LP for 2nd best

General (intractable) exact formulation

Tractable formulation for tree graphs

Approximations for non-tree graphs

Experiments

The 2nd best problem and LP

MAP 2nd best

The 2nd best problem and LP

maxx

f(x)MAP 2nd best

The 2nd best problem and LP

maxx !=x(1)

f(x)maxx

f(x)MAP 2nd best

The 2nd best problem and LP

maxx !=x(1)

f(x)maxx

f(x)

maxµ!M(G)

µ · !

MAP 2nd best

The 2nd best problem and LP

maxx !=x(1)

f(x)maxx

f(x)

maxµ!M(G)

µ · ! maxµ!M(G,x(1))

µ · !

x(1)

MAP 2nd best

The 2nd best problem and LP

maxx !=x(1)

f(x)maxx

f(x)

maxµ!M(G)

µ · ! maxµ!M(G,x(1))

µ · !

x(1)

MAP 2nd best

Approximations:

The 2nd best problem and LP

maxx !=x(1)

f(x)maxx

f(x)

maxµ!M(G)

µ · ! maxµ!M(G,x(1))

µ · !

x(1)

MAP 2nd best

Approximations:

The 2nd best problem and LP

maxx !=x(1)

f(x)maxx

f(x)

maxµ!M(G)

µ · ! maxµ!M(G,x(1))

µ · !

x(1)

MAP 2nd best

Approximations:

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)

M(G, z)

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)

µ

M(G, z)

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)

µ

There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)

M(G, z)

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)

µ

There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)

and:

M(G, z)

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)

µ

There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)

and: p(z) = 0

M(G, z)

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)

µ

There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)

and: p(z) = 0

M(G, z)

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)

µ

There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)

and: p(z) = 0

M(G)

M(G, z)

A new marginal polytope

Given an assignment z, define the Assignment Excluding Marginal Polytope:M(G, z)

µ

There exists a p(x) s.t. p(xi, xj) = µij(xi, xj)

and: p(z) = 0

zM(G)

M(G, z)

LP for the 2nd best problem

The 2nd best problem corresponds to the following LP:

maxx !=x(1)

f(x;!) = maxµ"M(G,x(1))

µ · !

x(1)

LP for the 2nd best problem

The 2nd best problem corresponds to the following LP:

maxx !=x(1)

f(x;!) = maxµ"M(G,x(1))

µ · !

Is there a simple characterization of ? M(G, x(1))

x(1)

LP for the 2nd best problem

The 2nd best problem corresponds to the following LP:

maxx !=x(1)

f(x;!) = maxµ"M(G,x(1))

µ · !

Is there a simple characterization of ? M(G, x(1))

Is it plus one inequality?M(G)

x(1)

LP for the 2nd best problem

The 2nd best problem corresponds to the following LP:

maxx !=x(1)

f(x;!) = maxµ"M(G,x(1))

µ · !

Is there a simple characterization of ? M(G, x(1))

Is it plus one inequality?

If so, what inequality?

M(G)

x(1)

OutlineLP formulation of the MAP problem

LP for 2nd best

General (intractable) exact formulation

Tractable formulation for tree graphs

Approximations for non-tree graphs

Experiments

Adding inequalities to z z

M(G)

Adding inequalities to Any valid inequality must separate from the other vertices

z z

M(G)

Adding inequalities to Any valid inequality must separate from the other vertices

How about: (Santos 91)!

i

µi(zi) ! n" 1

z z

M(G)

Adding inequalities to Any valid inequality must separate from the other vertices

How about: (Santos 91)

RHS is n for z and or less for other vertices

!

i

µi(zi) ! n" 1

z z

n! 1

M(G)

Adding inequalities to Any valid inequality must separate from the other vertices

How about: (Santos 91)

RHS is n for z and or less for other vertices

But: Results in fractional vertices, even for trees

!

i

µi(zi) ! n" 1

z z

n! 1

M(G)

Adding inequalities to Any valid inequality must separate from the other vertices

How about: (Santos 91)

RHS is n for z and or less for other vertices

But: Results in fractional vertices, even for trees

!

i

µi(zi) ! n" 1

z z

n! 1

M(G)

Adding inequalities to Any valid inequality must separate from the other vertices

How about: (Santos 91)

RHS is n for z and or less for other vertices

But: Results in fractional vertices, even for trees

Only an outer bound on

!

i

µi(zi) ! n" 1

z z

n! 1

M(G)

M(G, z)

The tree case

The tree caseFocus on the case where G is a tree

The tree caseFocus on the case where G is a tree

is given by pairwise consistencyM(G)

The tree caseFocus on the case where G is a tree

is given by pairwise consistency

Define:

I(µ,z) =!

i

(1! di)µi(zi) +!

ij!G

µij(zi, zj)

M(G)

The tree caseFocus on the case where G is a tree

is given by pairwise consistency

Define:

I(µ,z) =!

i

(1! di)µi(zi) +!

ij!G

µij(zi, zj)

M(G)

H(µ) =!

i

(1! di)Hi(Xi) +!

ij!G

H(Xi, Xj)Bethe:

The tree caseFocus on the case where G is a tree

is given by pairwise consistency

Define:

I(µ,z) =!

i

(1! di)µi(zi) +!

ij!G

µij(zi, zj)

M(G)

The tree caseFocus on the case where G is a tree

is given by pairwise consistency

Define:

I(µ,z) =!

i

(1! di)µi(zi) +!

ij!G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =!µ | µ !M(G), I(µ,z) " 0

"

The tree caseFocus on the case where G is a tree

is given by pairwise consistency

Define:

z

I(µ,z) =!

i

(1! di)µi(zi) +!

ij!G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =!µ | µ !M(G), I(µ,z) " 0

"M(G)

The tree caseFocus on the case where G is a tree

is given by pairwise consistency

Define:

z

I(µ,z) =!

i

(1! di)µi(zi) +!

ij!G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =!µ | µ !M(G), I(µ,z) " 0

"M(G)

I(µ,z) ! 0

The tree caseFocus on the case where G is a tree

is given by pairwise consistency

Define:

z

I(µ,z) =!

i

(1! di)µi(zi) +!

ij!G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =!µ | µ !M(G), I(µ,z) " 0

"M(G)

I(µ,z) ! 0

The tree caseFocus on the case where G is a tree

is given by pairwise consistency

Define:

z

I(µ,z) =!

i

(1! di)µi(zi) +!

ij!G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =!µ | µ !M(G), I(µ,z) " 0

"

I(µ,z) ! 0

M(G, z)

The tree caseFocus on the case where G is a tree

is given by pairwise consistency

Define:

z

I(µ,z) =!

i

(1! di)µi(zi) +!

ij!G

µij(zi, zj)

M(G)

Theorem:

M(G, z) =!µ | µ !M(G), I(µ,z) " 0

"

I(µ,z) ! 0

M(G, z)Proof...

ProofA(G, z) =

!µ | µ !M(G), I(µ,z) " 0

"Define:

ProofA(G, z) =

!µ | µ !M(G), I(µ,z) " 0

"Define:

A(G, z) =M(G, z)Want to show:

Proof

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ! A(G, z)

A(G, z) =!µ | µ !M(G), I(µ,z) " 0

"Define:

Proof

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ! A(G, z)

A(G, z) =!µ | µ !M(G), I(µ,z) " 0

"Define:

Can construct p(x)

Proof

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ! A(G, z)

A(G, z) =!µ | µ !M(G), I(µ,z) " 0

"Define:

Proof

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ! A(G, z)

F (µ) =

!""#

""$

min p(z)s.t. pij(xi, xj) = µij(xi, xj)

pi(xi) = µi(xi)p(x) ! 0

A(G, z) =!µ | µ !M(G), I(µ,z) " 0

"Define:

Proof

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ! A(G, z)

F (µ) =

!""#

""$

min p(z)s.t. pij(xi, xj) = µij(xi, xj)

pi(xi) = µi(xi)p(x) ! 0

A(G, z) =!µ | µ !M(G), I(µ,z) " 0

"

= 0!µ " A(G, z)

Define:

Proof

Want to show that if there exists a p(x) that has these marginals and p(z)=0.

µ ! A(G, z)

F (µ) =

!""#

""$

min p(z)s.t. pij(xi, xj) = µij(xi, xj)

pi(xi) = µi(xi)p(x) ! 0

In fact we can show that for trees:

µ !M(G) F (µ) = max{0, I(µ,z)}

A(G, z) =!µ | µ !M(G), I(µ,z) " 0

"

= 0!µ " A(G, z)

Define:

Proof - key ideas

F (µ) =

!""#

""$

min p(z)s.t. pij(xi, xj) = µij(xi, xj)

pi(xi) = µi(xi)p(x) ! 0

Proof - key ideas

F (µ) =

!""#

""$

min p(z)s.t. pij(xi, xj) = µij(xi, xj)

pi(xi) = µi(xi)p(x) ! 0 !x "= z

Proof - key ideas

F (µ) =

!""#

""$

min p(z)s.t. pij(xi, xj) = µij(xi, xj)

pi(xi) = µi(xi)p(x) ! 0 !x "= z

,

Proof - key ideas

F (µ) =

!""#

""$

min p(z)s.t. pij(xi, xj) = µij(xi, xj)

pi(xi) = µi(xi)p(x) ! 0 !x "= z

,

Dual: max ! · µs.t.

!ij !ij(xi, xj) +

!i !i(xi) ! 0 "x #= z!

ij !ij(zi,zj) +!

i !i(zi) = 1

Proof - key ideas

F (µ) =

!""#

""$

min p(z)s.t. pij(xi, xj) = µij(xi, xj)

pi(xi) = µi(xi)p(x) ! 0 !x "= z

We show that the value of the above is

,

I(µ,z)

Dual: max ! · µs.t.

!ij !ij(xi, xj) +

!i !i(xi) ! 0 "x #= z!

ij !ij(zi,zj) +!

i !i(zi) = 1

Proof - key ideas

F (µ) =

!""#

""$

min p(z)s.t. pij(xi, xj) = µij(xi, xj)

pi(xi) = µi(xi)p(x) ! 0 !x "= z

We show that the value of the above is

From there it’s easy to conclude that

,

I(µ,z)

Dual: max ! · µs.t.

!ij !ij(xi, xj) +

!i !i(xi) ! 0 "x #= z!

ij !ij(zi,zj) +!

i !i(zi) = 1

Proof - key ideas

F (µ) =

!""#

""$

min p(z)s.t. pij(xi, xj) = µij(xi, xj)

pi(xi) = µi(xi)p(x) ! 0 !x "= z

We show that the value of the above is

From there it’s easy to conclude that

F (µ) = max{0, I(µ,z)}

,

I(µ,z)

Dual: max ! · µs.t.

!ij !ij(xi, xj) +

!i !i(xi) ! 0 "x #= z!

ij !ij(zi,zj) +!

i !i(zi) = 1

Proof - Max marginalsmax ! · µs.t. !(x) ! 0 "x #= z

!(z) = 1!(x) =

!

ij

!ij(xi, xj) +!

i

!i(xi)

Proof - Max marginals

Use max-marginals:

max ! · µs.t. !(x) ! 0 "x #= z

!(z) = 1!(x) =

!

ij

!ij(xi, xj) +!

i

!i(xi)

Proof - Max marginals

Use max-marginals:

!̄(xi) = maxx̂:x̂i=xi

!(x)

!̄(xi.xj) = maxx̂:x̂i=xi,x̂j=xj

!(x)

max ! · µs.t. !(x) ! 0 "x #= z

!(z) = 1!(x) =

!

ij

!ij(xi, xj) +!

i

!i(xi)

Proof - Max marginals

Use max-marginals:

!̄(xi) = maxx̂:x̂i=xi

!(x)

!̄(xi.xj) = maxx̂:x̂i=xi,x̂j=xj

!(x)!̄(zi) = 1!̄(xi) ! 0 xi "= zi

max ! · µs.t. !(x) ! 0 "x #= z

!(z) = 1!(x) =

!

ij

!ij(xi, xj) +!

i

!i(xi)

Proof - Max marginals

Use max-marginals:

!̄(xi) = maxx̂:x̂i=xi

!(x)

!̄(xi.xj) = maxx̂:x̂i=xi,x̂j=xj

!(x)!̄(zi) = 1!̄(xi) ! 0 xi "= zi

max ! · µs.t. !(x) ! 0 "x #= z

!(z) = 1!(x) =

!

ij

!ij(xi, xj) +!

i

!i(xi)

Rewrite: !(x) =!

i

(1! di)!̄(xi) +!

ij!T

!̄ij(xi, xj)

Proof - Max marginals

Use max-marginals:

!̄(xi) = maxx̂:x̂i=xi

!(x)

!̄(xi.xj) = maxx̂:x̂i=xi,x̂j=xj

!(x)!̄(zi) = 1!̄(xi) ! 0 xi "= zi

Result follows after some algebra

max ! · µs.t. !(x) ! 0 "x #= z

!(z) = 1!(x) =

!

ij

!ij(xi, xj) +!

i

!i(xi)

Rewrite: !(x) =!

i

(1! di)!̄(xi) +!

ij!T

!̄ij(xi, xj)

Tree Graph - Summary

x(1)

M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0

"

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint

x(1)

M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0

"

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint

The 2nd best satisfies so it cannot be any assignment

x(1)

M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0

"

I(µ,x(1)) = 0

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint

The 2nd best satisfies so it cannot be any assignment

x(1)

x(2)

M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0

"

I(µ,x(1)) = 0

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint

The 2nd best satisfies so it cannot be any assignment

x(1)

x(2)

x(2)

M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0

"

I(µ,x(1)) = 0

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint

The 2nd best satisfies so it cannot be any assignment

x(1)

x(2)

x(2)

x(2)

M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0

"

I(µ,x(1)) = 0

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint

The 2nd best satisfies so it cannot be any assignment

x(1)

x(2)

x(2)

x(2)X

M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0

"

I(µ,x(1)) = 0

Tree Graph - Summary

The LP for 2nd best differs from the marginal polytope by one linear inequality constraint

The 2nd best satisfies so it cannot be any assignment

x(1)

x(2)

x(2)

M(G, x(1)) =!µ | µ !M(G), I(µ,x(1)) " 0

"

I(µ,x(1)) = 0

Non tree graphsAny graph can be converted into a junction tree

We can apply our tree result there

For a junction tree with cliques C and separators S, the inequality is:

!

S!S(1! dS)µS(zS) +

!

C!CµC(zC) " 0

Specifying the marginal polytope requires a number of variables exponential in the tree width. Not practical.

OutlineLP formulation of the MAP problem

LP for 2nd best

General (intractable) exact formulation

Tractable formulation for tree graphs

Approximations for non-tree graphs

Experiments

Non trees - Approximations

x(1)

TrueM(G, x(1))

Non trees - Approximations

x(1)

TrueM(G, x(1))

Non trees - Approximations

x(1)

TrueM(G, x(1))

Outer bound on M(G)

Non trees - Approximations

x(1)

TrueM(G, x(1))

Outer bound on M(G)

Non trees - Approximations

x(1)

TrueM(G, x(1))

Outer bound on M(G)

Non trees - Approximations

x(1)

TrueM(G, x(1))

Outer bound on M(G)

Spanning tree inequalities

Give a spanning subtree T of G defineIT (µ,z) =

!

i

(1! di)µi(zi) +!

ij!T

µij(zi, zj)

IT (µ,z) ! 0And the constraint:

Spanning tree inequalities

Give a spanning subtree T of G defineIT (µ,z) =

!

i

(1! di)µi(zi) +!

ij!T

µij(zi, zj)

IT (µ,z) ! 0And the constraint:

Spanning tree inequalities

Give a spanning subtree T of G defineIT (µ,z) =

!

i

(1! di)µi(zi) +!

ij!T

µij(zi, zj)

IT (µ,z) ! 0And the constraint:

Spanning tree inequalities

Give a spanning subtree T of G defineIT (µ,z) =

!

i

(1! di)µi(zi) +!

ij!T

µij(zi, zj)

IT (µ,z) ! 0And the constraint:

Spanning tree inequalities

Give a spanning subtree T of G defineIT (µ,z) =

!

i

(1! di)µi(zi) +!

ij!T

µij(zi, zj)

Separates z from the other vertices but might result in fractional vertices

IT (µ,z) ! 0And the constraint:

Spanning tree inequalities

Give a spanning subtree T of G defineIT (µ,z) =

!

i

(1! di)µi(zi) +!

ij!T

µij(zi, zj)

Separates z from the other vertices but might result in fractional vertices

z

IT (µ,z) ! 0And the constraint:

Spanning tree inequalities

Give a spanning subtree T of G defineIT (µ,z) =

!

i

(1! di)µi(zi) +!

ij!T

µij(zi, zj)

Separates z from the other vertices but might result in fractional vertices

z

IT (µ,z) ! 0And the constraint:

IT (µ,z) ! 0

Spanning tree inequalities

Give a spanning subtree T of G defineIT (µ,z) =

!

i

(1! di)µi(zi) +!

ij!T

µij(zi, zj)

Separates z from the other vertices but might result in fractional vertices

zFractional vertex

IT (µ,z) ! 0And the constraint:

IT (µ,z) ! 0

Adding all spanning trees

Adding all spanning trees

Can we add all spanning tree inequalities efficiently?

Adding all spanning trees

Can we add all spanning tree inequalities efficiently?

Yes, via a cutting plane approach:

Adding all spanning trees

Can we add all spanning tree inequalities efficiently?

Yes, via a cutting plane approach:

Start with one inequality

Adding all spanning trees

Can we add all spanning tree inequalities efficiently?

Yes, via a cutting plane approach:

Start with one inequality

Solve LP

Adding all spanning trees

Can we add all spanning tree inequalities efficiently?

Yes, via a cutting plane approach:

Start with one inequality

Solve LP

If solution is fractional, find a violated tree inequality (if exists) and add it

Cutting Plane Algorithm

Cutting Plane Algorithm

z

Cutting Plane Algorithm

zT1

Cutting Plane Algorithm

zµ1

T1

Cutting Plane Algorithm

zµ1 Is there a tree

inequality thatviolates?

µ1

T1

Cutting Plane Algorithm

zµ1 Is there a tree

inequality thatviolates?

µ1

T1

T2

Cutting Plane Algorithm

zµ1 Is there a tree

inequality thatviolates?

µ1

T1

T2

Cutting Plane Algorithm

How do we find a violated tree inequality?

Note: Even all spanning tree inequalities might not suffice

zµ1 Is there a tree

inequality thatviolates?

µ1

T1

T2

Finding a violated spanning tree

For a given find

If it’s positive, add the maximizing tree

µ maxT

IT (µ,z)

Finding a violated spanning tree

For a given find

If it’s positive, add the maximizing tree

µ maxT

IT (µ,z)

How can we maximize over all trees? Note that:

Finding a violated spanning tree

For a given find

If it’s positive, add the maximizing tree

µ maxT

IT (µ,z)

How can we maximize over all trees? Note that:

IT (µ,z) =!

ij!T

"µij(zi, zj)! µi(zi)! µj(zj)

#+

!

i

µi(zi)

Finding a violated spanning tree

For a given find

If it’s positive, add the maximizing tree

µ maxT

IT (µ,z)

How can we maximize over all trees? Note that:

IT (µ,z) =!

ij!T

"µij(zi, zj)! µi(zi)! µj(zj)

#+

!

i

µi(zi)

Finding a violated spanning tree

For a given find

If it’s positive, add the maximizing tree

µ maxT

IT (µ,z)

How can we maximize over all trees? Note that:

IT (µ,z) =!

ij!T

"µij(zi, zj)! µi(zi)! µj(zj)

#+

!

i

µi(zi)

wij

Finding a violated spanning tree

For a given find

If it’s positive, add the maximizing tree

µ maxT

IT (µ,z)

How can we maximize over all trees? Note that:

IT (µ,z) =!

ij!T

"µij(zi, zj)! µi(zi)! µj(zj)

#+

!

i

µi(zi)

wij Fixed

Finding a violated spanning tree

For a given find

If it’s positive, add the maximizing tree

µ maxT

IT (µ,z)

How can we maximize over all trees? Note that:

IT (µ,z) =!

ij!T

"µij(zi, zj)! µi(zi)! µj(zj)

#+

!

i

µi(zi)

Decomposes into edge scores. Maximizing tree can be found using a maximum-weight-spanning-tree algorithm (e.g., Wainwright 02)

wij Fixed

ExperimentsAlternative algorithms for approximate 2nd best:

Using approximate marginals from max-product (BMMF; Yanover and Weiss 04)

Lawler/Nillson (72,80) - Partition assignments :

Maximize over each part approximately. Cost O(n)

Our algorithm: STRIPES

x != x(1)

x1 != x(1)1 x2 = " x3 = " . . . xn = "

x1 = x(1)1 x2 != x(1)

2 x3 = " . . . xn = "...

......

......

x1 = x(1)1 x2 = x(1)

2 x3 = x(3)1 . . . xn != x(n)

1

Attractive GridsIsing models with ferromagnetic interaction

The local-polytope guaranteed to yield exact first best (but not equal to the marginal polytope)

Goal: Find 50 best. Stripes and Nillson find all of them exactly. Up to 19 spanning trees added

S N B0

0.5

1

S N B0

50

Stripes Nillson BMMF

0

50

0Stripes Nillson BMMF

Rank Run Time

Protein Side Chain Prediction

Given protein’s 3D shape (backbone), choose most probable side chain configuration

xi!

xk!

xj !

xh!

G=(V,E)!

Protein backbone!

Side-chains!

(MRFs from Yanover, Meltzer, Weiss ‘06)!

Can be cast as a MAP problem

Important to obtain multiple possible solutions

p(x) ! eP

ij!E !ij(xi,xj)

Protein Side Chain Prediction

Stripes found the exact solutions for all problems studied

In some cases, we used a tighter approximation of the marginal polytope (Sontag et al, UAI 08)

S N B0

50

S N B0

0.5

1

Stripes Nillson BMMF0

50

0Stripes Nillson BMMF

Open Questions

Open QuestionsWhen are spanning trees enough?

Open QuestionsWhen are spanning trees enough?

What is the polytope structure for k-best?

Open QuestionsWhen are spanning trees enough?

What is the polytope structure for k-best?

Finding k-best “different” solutions

Open QuestionsWhen are spanning trees enough?

What is the polytope structure for k-best?

Finding k-best “different” solutions

Scalable algorithms

Open QuestionsWhen are spanning trees enough?

What is the polytope structure for k-best?

Finding k-best “different” solutions

Scalable algorithms

If a given problem is solved with a marginal polytope relaxation, what can we say about the second best?

Open QuestionsWhen are spanning trees enough?

What is the polytope structure for k-best?

Finding k-best “different” solutions

Scalable algorithms

If a given problem is solved with a marginal polytope relaxation, what can we say about the second best?

SummaryThe 2nd best can be posed as a linear program

For trees differs from 1st best by one constraint only

For non-trees, approximation can be devised by adding inequalities for all spanning trees

Empirically effective

top related