Algorithms for MAP estimation in Markov Random Fields

Algorithms for MAP estimationin Markov Random Fields

Vladimir Kolmogorov

University College London

Energy function

qp

qppqp

ppconst xxxE,

),()()|( x

p

qunary terms

(data)pairwise terms

(coherence)

- xp are discrete variables (for example, xp{0,1})

- p(•) are unary potentials

- pq(•,•) are pairwise potentials

Minimisation algorithms• Min Cut / Max Flow [Ford&Fulkerson ‘56]

[Grieg, Porteous, Seheult ‘89] : non-iterative (binary variables)[Boykov, Veksler, Zabih ‘99] : iterative - alpha-expansion, alpha-beta swap, … (multi-valued variables)+ If applicable, gives very accurate results– Can be applied to a restricted class of functions

• BP – Max-product Belief Propagation [Pearl ‘86]+ Can be applied to any energy function– In vision results are usually worse than that of graph cuts– Does not always converge

• TRW - Max-product Tree-reweighted Message Passing [Wainwright, Jaakkola, Willsky ‘02] , [Kolmogorov ‘05]+ Can be applied to any energy function+ For stereo finds lower energy than graph cuts + Convergence guarantees for the algorithm in [Kolmogorov ’05]

Main idea: LP relaxation• Goal: Minimize energy E(x) under constraints

xp{0,1}

• In general, NP-hard problem!

• Relax discreteness constraints: allow xp[0,1]

• Results in linear program. Can be solved in polynomial time!

Energy functionwith discrete variables

LP relaxation

E E Etight not tight

Solving LP relaxation• Too large for general purpose LP solvers (e.g. interior point methods) • Solve dual problem instead of primal:

– Formulate lower bound on the energy– Maximize this bound– When done, solves primal problem (LP relaxation)

• Two different ways to formulate lower bound– Via posiforms: leads to maxflow algorithm– Via convex combination of trees: leads to tree-reweighted message passing

Lower bound onthe energy function

E

Energy functionwith discrete variables

E E

LP relaxation

Notation and Preliminaries

Energy function - visualisation

0

4

0

1

3

02

5

node p edge (p,q) node q

label 0

label 1

)0(p

)1,0(pq

qp

qppqp

ppconst xxxE,

),()()|( x

0

const

0

4

0

1

3

02

5

node p edge (p,q) node q

label 0

label 1

Energy function - visualisation

qp

qppqp

ppconst xxxE,

),()()|( x

0

vector of

all parameters

0 0 4

4

1 12

5

0

-1

-1

0 + 1

Reparameterisation

Reparameterisation

0 0 3

4

1 02

5

• Definition. is a reparameterisation of

if they define the same energy:

xxx any for )|()|( EE

4 -1

1 -1 0 +1

• Maxflow, BP and TRW perform reparameterisations

1

Part I: Lower bound viaposiforms

( maxflow algorithm)

non-negative

const - lower bound on the energy:

xx constE )|(

maximize

Lower bound via posiforms[Hammer, Hansen, Simeone’84]

qp

qppqp

ppconst xxxE,

),()()|( x

• Maximisation algorithm?– Consider functions of binary variables only

• Maximising lower bound for submodular functions – Definition of submodular functions– Overview of min cut/max flow– Reduction to max flow– Global minimum of the energy

• Maximising lower bound for non-submodular functions– Reduction to max flow

• More complicated graph– Part of optimal solution

Outline of part I

• Definition: E is submodular if every pairwise term satisfies

• Can be converted to “canonical form”:

Submodular functions of binary variables

)0,1()1,0()1,1()0,0( pqpqpqpq

2

1 2 3 4

10

0 05

zerocost

Overview of min cut/max flow

Min Cut problemsource

sink

2 1

1

2

3

45

Directed weighted graph

Min Cut problem

sink

2 1

1

2

3

45

S = {source, node 1}T = {sink, node 2, node 3}

Cut:source

Min Cut problem

sink

2 1

1

2

3

45

S = {source, node 1}T = {sink, node 2, node 3}

Cut:

• Task: Compute cut with minimum cost

Cost(S,T) = 1 + 1 = 2

source

sink

2 1

1

2

3

45

source

Maxflow algorithm

value(flow)=0

Maxflow algorithm

sink

2 1

1

2

3

45

value(flow)=0

source

Maxflow algorithm

sink

1 1

0

3

3

44

value(flow)=1

source

Maxflow algorithm

sink

1 1

0

3

3

44

value(flow)=1

source

Maxflow algorithm

sink

1 0

0

3

4

33

value(flow)=2

source

Maxflow algorithm

sink

1 0

0

3

4

33

value(flow)=2

source

value(flow)=2

sink

1 0

0

3

4

33

source

Maxflow algorithm

Maximising lower bound for submodular functions:

Reduction to maxflow

2

1 2 3 4

10

0 05

sink

2 1

1

2

3

45

source

value(flow)=0

0

Maxflow algorithm and reparameterisation

sink

2 1

1

2

3

45

value(flow)=0

2

1 2 3 4

10

0 05

0

source


sink

1 1

0

3

3

44

value(flow)=1

1

0 3 3 4

10

0 04

1

source


sink

1 1

0

3

3

44

value(flow)=1

1

0 3 3 4

10

0 04

1

source


sink

1 0

0

3

4

33

value(flow)=2

1

0 3 4 3

00

0 03

2

source


sink

1 0

0

3

4

33

value(flow)=2

1

0 3 4 3

00

0 03

2

source


value(flow)=2

0

00

0

)1,1,0(x

minimum of the energy:

2

0

sink

1 0

0

3

4

33

source


Maximising lower bound for non-submodular functions

Arbitrary functions of binary variables

• Can be solved via maxflow [Boros,Hammer,Sun’91]– Specially constructed graph

• Gives solution to LP relaxation: for each node

xp{0, 1/2, 1}

E

LP relaxation

non-negativemaximize

qp

qppqp

ppconst xxxE,

),()()|( x

Arbitrary functions of binary variables

0

1

0

1

1 1/2 1/2

1/2

1/2

Part of optimal solution[Hammer, Hansen, Simeone’84]

Part II: Lower bound viaconvex combination of trees

( tree-reweighted message passing)

• Goal: compute minimum of the energy for

• In general, intractable!

• Obtaining lower bound:– Split into several components: – Compute minimum for each component:

– Combine to get a bound on

• Use trees!

)|(min)( xx

E

)|(min)( ii E xx

Convex combination of trees [Wainwright, Jaakkola, Willsky ’02]

'

2

1 TT2

1

graph tree T tree T’

)( )(2

1 T )(2

1 'T

lower bound on the energymaximize

Convex combination of trees (cont’d)

TRW algorithms• Goal: find reparameterisation maximizing lower bound

• Apply sequence of different reparameterisation operations:– Node averaging– Ordinary BP on trees

• Order of operations?– Affects performance dramatically

• Algorithms:– [Wainwright et al. ’02]: parallel schedule

• May not converge

– [Kolmogorov’05]: specific sequential schedule• Lower bound does not decrease, convergence guarantees

Node averaging

0

1

4

0

Node averaging

2

0.5

2

0.5

• Send messages– Equivalent to reparameterising node and edge parameters

• Two passes (forward and backward)

Belief propagation (BP) on trees

Belief propagation (BP) on trees

constEpx

p

)|(min)0(0

x3

0constE

pxp

)|(min)1(

1 x

• Key property (Wainwright et al.):

Upon termination p gives min-marginals for node p:

TRW algorithm of Wainwright et al. with tree-based updates (TRW-T)

Run BP on all trees “Average” all nodes

• If converges, gives (local) maximum of lower bound• Not guaranteed to converge. • Lower bound may go down.

Sequential TRW algorithm (TRW-S)[Kolmogorov’05]

Run BP on all trees containing p

“Average” node p

Pick node p

Main property of TRW-S

• Theorem: lower bound never decreases.

• Proof sketch:

constT 0)(

0

1

4

0

' 0)( ' constT

Main property of TRW-S

constT 5.0)(

2

0.5

2

0.5

' 5.0)( ' constT

• Theorem: lower bound never decreases.

• Proof sketch:

TRW-S algorithm

• Particular order of averaging and BP operations

• Lower bound guaranteed not to decrease

• There exists limit point that satisfies weak tree agreement condition

• Efficiency?

“Average” node p

Pick node p

inefficient?

Efficient implementation

Run BP on all trees containing p


• Key observation: Node averaging operation preserves messages oriented towards this node

• Reuse previously passed messages!

• Need a special choice of trees:– Pick an ordering of nodes– Trees: monotonic chains

4 5 6

7 8 9

1 2 3


4 5 6

7 8 9

1 2 3

• Algorithm:– Forward pass:

• process nodes in the increasing order

• pass messages from lower neighbours

– Backward pass:• do the same in reverse order

• Linear running time of one iteration


4 5 6

7 8 9

1 2 3






Memory requirements

• Additional advantage of TRW-S: – Needs only half as much memory as standard message

passing!

– Similar observation for bipartite graphs and parallel schedule was made in [Felzenszwalb&Huttenlocher’04]

standard message passing TRW-S

Experimental results: binary segmentation (“GrabCut”)

0 100 200 300 400

3

4

5

6x 10

5

Time

Energy average over 50 instances

Experimental results: stereo

left image ground truth

BP TRW-S20 40 60 80 100

3.6

3.8

4x 10

5

Experimental results: stereo

20 40 60 80 100 120 1401.36

1.4

1.44

x 106

20 40 60 80 100 120 140

1.93

1.94

x 107

Summary• MAP estimation algorithms are based on LP relaxation

– Maximize lower bound

• Two ways to formulate lower bound

• Via posiforms: leads to maxflow algorithm– Polynomial time solution– But: applicable for restricted energies (e.g. binary variables)

• Submodular functions: global minimum• Non-submodular functions: part of optimal solution

• Via convex combination of trees: leads to TRW algorithm– Convergence in the limit (for TRW-S)– Applicable to arbitrary energy function

• Graph cuts vs. TRW:– Accuracy: similar– Generality: TRW is more general– Speed: for stereo TRW is currently 2-5 times slower. But:

• 3 vs. 50 years of research!• More suitable for parallel implementation (GPU? Hardware?)

Discrete vs. continuous functionals Continuous formulation (Geodesic active contours)

qp

qppqp

pp xxExEE,

),()()(x ||

0

))(()(C

dssCgCE

• Maxflow algorithm– Global minimum, polynomial-time

• Metrication artefacts?

• Level sets– Numerical stability?

• Geometrically motivated– Invariant under rotation

Discrete formulation (Graph cuts)

Geo-cuts

• Continuous functional

• Construct graph such that for smooth contours C

)interior(

||

0

)()(C

C

dVfdsgCE N

cut ingcorrespond theofcost )( CE

• Class of continuous functionals?

[Boykov&Kolmogorov’03], [Kolmogorov&Boykov’05]:

– Geometric length/area (e.g. Riemannian)

– Flux of a given vector field

– Regional term

TRW formulation

subject to

max)()(

T

TTT

TTθ

where is the collection of all parameter vectors

is a fixed probability distribution on trees T

T θ

T







4 5 6

7 8 9

1 2 3

node being

processed

valid

messages


4 5 6

7 8 9

1 2 3












4 5 6

7 8 9

1 2 3

node being

processed

valid

messages


4 5 6

7 8 9

1 2 3

valid

messages

node being

processed







4 5 6

7 8 9

1 2 3

node being

processed

valid

messages






Algorithms for MAP estimation in Markov Random Fields

Documents

lower energy

energymaximising lower

energy ex

q node qlabel

lower boundvia posiforms

convergetrw max

lp relaxationgoal

constraints xp