Top Banner
Algorithms for MAP estimation in Markov Random Fields Vladimir Kolmogorov University College London
67

Algorithms for MAP estimation in Markov Random Fields

Jan 04, 2016

Download

Documents

kailani-james

Algorithms for MAP estimation in Markov Random Fields. Vladimir Kolmogorov University College London. Energy function. q. unary terms (data). pairwise terms (coherence). p. - x p are discrete variables (for example, x p {0,1} ) - q p ( • ) are unary potentials - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Algorithms for MAP estimation in Markov Random Fields

Algorithms for MAP estimationin Markov Random Fields

Vladimir Kolmogorov

University College London

Page 2: Algorithms for MAP estimation in Markov Random Fields

Energy function

qp

qppqp

ppconst xxxE,

),()()|( x

p

qunary terms

(data)pairwise terms

(coherence)

- xp are discrete variables (for example, xp{0,1})

- p(•) are unary potentials

- pq(•,•) are pairwise potentials

Page 3: Algorithms for MAP estimation in Markov Random Fields

Minimisation algorithms• Min Cut / Max Flow [Ford&Fulkerson ‘56]

[Grieg, Porteous, Seheult ‘89] : non-iterative (binary variables)[Boykov, Veksler, Zabih ‘99] : iterative - alpha-expansion, alpha-beta swap, … (multi-valued variables)+ If applicable, gives very accurate results– Can be applied to a restricted class of functions

• BP – Max-product Belief Propagation [Pearl ‘86]+ Can be applied to any energy function– In vision results are usually worse than that of graph cuts– Does not always converge

• TRW - Max-product Tree-reweighted Message Passing [Wainwright, Jaakkola, Willsky ‘02] , [Kolmogorov ‘05]+ Can be applied to any energy function+ For stereo finds lower energy than graph cuts + Convergence guarantees for the algorithm in [Kolmogorov ’05]

Page 4: Algorithms for MAP estimation in Markov Random Fields

Main idea: LP relaxation• Goal: Minimize energy E(x) under constraints

xp{0,1}

• In general, NP-hard problem!

• Relax discreteness constraints: allow xp[0,1]

• Results in linear program. Can be solved in polynomial time!

Energy functionwith discrete variables

LP relaxation

E E Etight not tight

Page 5: Algorithms for MAP estimation in Markov Random Fields

Solving LP relaxation• Too large for general purpose LP solvers (e.g. interior point methods) • Solve dual problem instead of primal:

– Formulate lower bound on the energy– Maximize this bound– When done, solves primal problem (LP relaxation)

• Two different ways to formulate lower bound– Via posiforms: leads to maxflow algorithm– Via convex combination of trees: leads to tree-reweighted message passing

Lower bound onthe energy function

E

Energy functionwith discrete variables

E E

LP relaxation

Page 6: Algorithms for MAP estimation in Markov Random Fields

Notation and Preliminaries

Page 7: Algorithms for MAP estimation in Markov Random Fields

Energy function - visualisation

0

4

0

1

3

02

5

node p edge (p,q) node q

label 0

label 1

)0(p

)1,0(pq

qp

qppqp

ppconst xxxE,

),()()|( x

0

const

Page 8: Algorithms for MAP estimation in Markov Random Fields

0

4

0

1

3

02

5

node p edge (p,q) node q

label 0

label 1

Energy function - visualisation

qp

qppqp

ppconst xxxE,

),()()|( x

0

vector of

all parameters

Page 9: Algorithms for MAP estimation in Markov Random Fields

0 0 4

4

1 12

5

0

-1

-1

0 + 1

Reparameterisation

Page 10: Algorithms for MAP estimation in Markov Random Fields

Reparameterisation

0 0 3

4

1 02

5

• Definition. is a reparameterisation of

if they define the same energy:

xxx any for )|()|( EE

4 -1

1 -1 0 +1

• Maxflow, BP and TRW perform reparameterisations

1

Page 11: Algorithms for MAP estimation in Markov Random Fields

Part I: Lower bound viaposiforms

( maxflow algorithm)

Page 12: Algorithms for MAP estimation in Markov Random Fields

non-negative

const - lower bound on the energy:

xx constE )|(

maximize

Lower bound via posiforms[Hammer, Hansen, Simeone’84]

qp

qppqp

ppconst xxxE,

),()()|( x

Page 13: Algorithms for MAP estimation in Markov Random Fields

• Maximisation algorithm?– Consider functions of binary variables only

• Maximising lower bound for submodular functions – Definition of submodular functions– Overview of min cut/max flow– Reduction to max flow– Global minimum of the energy

• Maximising lower bound for non-submodular functions– Reduction to max flow

• More complicated graph– Part of optimal solution

Outline of part I

Page 14: Algorithms for MAP estimation in Markov Random Fields

• Definition: E is submodular if every pairwise term satisfies

• Can be converted to “canonical form”:

Submodular functions of binary variables

)0,1()1,0()1,1()0,0( pqpqpqpq

2

1 2 3 4

10

0 05

zerocost

Page 15: Algorithms for MAP estimation in Markov Random Fields

Overview of min cut/max flow

Page 16: Algorithms for MAP estimation in Markov Random Fields

Min Cut problemsource

sink

2 1

1

2

3

45

Directed weighted graph

Page 17: Algorithms for MAP estimation in Markov Random Fields

Min Cut problem

sink

2 1

1

2

3

45

S = {source, node 1}T = {sink, node 2, node 3}

Cut:source

Page 18: Algorithms for MAP estimation in Markov Random Fields

Min Cut problem

sink

2 1

1

2

3

45

S = {source, node 1}T = {sink, node 2, node 3}

Cut:

• Task: Compute cut with minimum cost

Cost(S,T) = 1 + 1 = 2

source

Page 19: Algorithms for MAP estimation in Markov Random Fields

sink

2 1

1

2

3

45

source

Maxflow algorithm

value(flow)=0

Page 20: Algorithms for MAP estimation in Markov Random Fields

Maxflow algorithm

sink

2 1

1

2

3

45

value(flow)=0

source

Page 21: Algorithms for MAP estimation in Markov Random Fields

Maxflow algorithm

sink

1 1

0

3

3

44

value(flow)=1

source

Page 22: Algorithms for MAP estimation in Markov Random Fields

Maxflow algorithm

sink

1 1

0

3

3

44

value(flow)=1

source

Page 23: Algorithms for MAP estimation in Markov Random Fields

Maxflow algorithm

sink

1 0

0

3

4

33

value(flow)=2

source

Page 24: Algorithms for MAP estimation in Markov Random Fields

Maxflow algorithm

sink

1 0

0

3

4

33

value(flow)=2

source

Page 25: Algorithms for MAP estimation in Markov Random Fields

value(flow)=2

sink

1 0

0

3

4

33

source

Maxflow algorithm

Page 26: Algorithms for MAP estimation in Markov Random Fields

Maximising lower bound for submodular functions:

Reduction to maxflow

Page 27: Algorithms for MAP estimation in Markov Random Fields

2

1 2 3 4

10

0 05

sink

2 1

1

2

3

45

source

value(flow)=0

0

Maxflow algorithm and reparameterisation

Page 28: Algorithms for MAP estimation in Markov Random Fields

sink

2 1

1

2

3

45

value(flow)=0

2

1 2 3 4

10

0 05

0

source

Maxflow algorithm and reparameterisation

Page 29: Algorithms for MAP estimation in Markov Random Fields

sink

1 1

0

3

3

44

value(flow)=1

1

0 3 3 4

10

0 04

1

source

Maxflow algorithm and reparameterisation

Page 30: Algorithms for MAP estimation in Markov Random Fields

sink

1 1

0

3

3

44

value(flow)=1

1

0 3 3 4

10

0 04

1

source

Maxflow algorithm and reparameterisation

Page 31: Algorithms for MAP estimation in Markov Random Fields

sink

1 0

0

3

4

33

value(flow)=2

1

0 3 4 3

00

0 03

2

source

Maxflow algorithm and reparameterisation

Page 32: Algorithms for MAP estimation in Markov Random Fields

sink

1 0

0

3

4

33

value(flow)=2

1

0 3 4 3

00

0 03

2

source

Maxflow algorithm and reparameterisation

Page 33: Algorithms for MAP estimation in Markov Random Fields

value(flow)=2

0

00

0

)1,1,0(x

minimum of the energy:

2

0

sink

1 0

0

3

4

33

source

Maxflow algorithm and reparameterisation

Page 34: Algorithms for MAP estimation in Markov Random Fields

Maximising lower bound for non-submodular functions

Page 35: Algorithms for MAP estimation in Markov Random Fields

Arbitrary functions of binary variables

• Can be solved via maxflow [Boros,Hammer,Sun’91]– Specially constructed graph

• Gives solution to LP relaxation: for each node

xp{0, 1/2, 1}

E

LP relaxation

non-negativemaximize

qp

qppqp

ppconst xxxE,

),()()|( x

Page 36: Algorithms for MAP estimation in Markov Random Fields

Arbitrary functions of binary variables

0

1

0

1

1 1/2 1/2

1/2

1/2

Part of optimal solution[Hammer, Hansen, Simeone’84]

Page 37: Algorithms for MAP estimation in Markov Random Fields

Part II: Lower bound viaconvex combination of trees

( tree-reweighted message passing)

Page 38: Algorithms for MAP estimation in Markov Random Fields

• Goal: compute minimum of the energy for

• In general, intractable!

• Obtaining lower bound:– Split into several components: – Compute minimum for each component:

– Combine to get a bound on

• Use trees!

)|(min)( xx

E

)|(min)( ii E xx

Convex combination of trees [Wainwright, Jaakkola, Willsky ’02]

Page 39: Algorithms for MAP estimation in Markov Random Fields

'

2

1 TT2

1

graph tree T tree T’

)( )(2

1 T )(2

1 'T

lower bound on the energymaximize

Convex combination of trees (cont’d)

Page 40: Algorithms for MAP estimation in Markov Random Fields

TRW algorithms• Goal: find reparameterisation maximizing lower bound

• Apply sequence of different reparameterisation operations:– Node averaging– Ordinary BP on trees

• Order of operations?– Affects performance dramatically

• Algorithms:– [Wainwright et al. ’02]: parallel schedule

• May not converge

– [Kolmogorov’05]: specific sequential schedule• Lower bound does not decrease, convergence guarantees

Page 41: Algorithms for MAP estimation in Markov Random Fields

Node averaging

0

1

4

0

Page 42: Algorithms for MAP estimation in Markov Random Fields

Node averaging

2

0.5

2

0.5

Page 43: Algorithms for MAP estimation in Markov Random Fields

• Send messages– Equivalent to reparameterising node and edge parameters

• Two passes (forward and backward)

Belief propagation (BP) on trees

Page 44: Algorithms for MAP estimation in Markov Random Fields

Belief propagation (BP) on trees

constEpx

p

)|(min)0(0

x3

0constE

pxp

)|(min)1(

1 x

• Key property (Wainwright et al.):

Upon termination p gives min-marginals for node p:

Page 45: Algorithms for MAP estimation in Markov Random Fields

TRW algorithm of Wainwright et al. with tree-based updates (TRW-T)

Run BP on all trees “Average” all nodes

• If converges, gives (local) maximum of lower bound• Not guaranteed to converge. • Lower bound may go down.

Page 46: Algorithms for MAP estimation in Markov Random Fields

Sequential TRW algorithm (TRW-S)[Kolmogorov’05]

Run BP on all trees containing p

“Average” node p

Pick node p

Page 47: Algorithms for MAP estimation in Markov Random Fields

Main property of TRW-S

• Theorem: lower bound never decreases.

• Proof sketch:

constT 0)(

0

1

4

0

' 0)( ' constT

Page 48: Algorithms for MAP estimation in Markov Random Fields

Main property of TRW-S

constT 5.0)(

2

0.5

2

0.5

' 5.0)( ' constT

• Theorem: lower bound never decreases.

• Proof sketch:

Page 49: Algorithms for MAP estimation in Markov Random Fields

TRW-S algorithm

• Particular order of averaging and BP operations

• Lower bound guaranteed not to decrease

• There exists limit point that satisfies weak tree agreement condition

• Efficiency?

Page 50: Algorithms for MAP estimation in Markov Random Fields

“Average” node p

Pick node p

inefficient?

Efficient implementation

Run BP on all trees containing p

Page 51: Algorithms for MAP estimation in Markov Random Fields

Efficient implementation

• Key observation: Node averaging operation preserves messages oriented towards this node

• Reuse previously passed messages!

• Need a special choice of trees:– Pick an ordering of nodes– Trees: monotonic chains

4 5 6

7 8 9

1 2 3

Page 52: Algorithms for MAP estimation in Markov Random Fields

Efficient implementation

4 5 6

7 8 9

1 2 3

• Algorithm:– Forward pass:

• process nodes in the increasing order

• pass messages from lower neighbours

– Backward pass:• do the same in reverse order

• Linear running time of one iteration

Page 53: Algorithms for MAP estimation in Markov Random Fields

Efficient implementation

4 5 6

7 8 9

1 2 3

• Algorithm:– Forward pass:

• process nodes in the increasing order

• pass messages from lower neighbours

– Backward pass:• do the same in reverse order

• Linear running time of one iteration

Page 54: Algorithms for MAP estimation in Markov Random Fields

Memory requirements

• Additional advantage of TRW-S: – Needs only half as much memory as standard message

passing!

– Similar observation for bipartite graphs and parallel schedule was made in [Felzenszwalb&Huttenlocher’04]

standard message passing TRW-S

Page 55: Algorithms for MAP estimation in Markov Random Fields

Experimental results: binary segmentation (“GrabCut”)

0 100 200 300 400

3

4

5

6x 10

5

Time

Energy average over 50 instances

Page 56: Algorithms for MAP estimation in Markov Random Fields

Experimental results: stereo

left image ground truth

BP TRW-S20 40 60 80 100

3.6

3.8

4x 10

5

Page 57: Algorithms for MAP estimation in Markov Random Fields

Experimental results: stereo

20 40 60 80 100 120 1401.36

1.4

1.44

x 106

20 40 60 80 100 120 140

1.93

1.94

x 107

Page 58: Algorithms for MAP estimation in Markov Random Fields

Summary• MAP estimation algorithms are based on LP relaxation

– Maximize lower bound

• Two ways to formulate lower bound

• Via posiforms: leads to maxflow algorithm– Polynomial time solution– But: applicable for restricted energies (e.g. binary variables)

• Submodular functions: global minimum• Non-submodular functions: part of optimal solution

• Via convex combination of trees: leads to TRW algorithm– Convergence in the limit (for TRW-S)– Applicable to arbitrary energy function

• Graph cuts vs. TRW:– Accuracy: similar– Generality: TRW is more general– Speed: for stereo TRW is currently 2-5 times slower. But:

• 3 vs. 50 years of research!• More suitable for parallel implementation (GPU? Hardware?)

Page 59: Algorithms for MAP estimation in Markov Random Fields

Discrete vs. continuous functionals Continuous formulation (Geodesic active contours)

qp

qppqp

pp xxExEE,

),()()(x ||

0

))(()(C

dssCgCE

• Maxflow algorithm– Global minimum, polynomial-time

• Metrication artefacts?

• Level sets– Numerical stability?

• Geometrically motivated– Invariant under rotation

Discrete formulation (Graph cuts)

Page 60: Algorithms for MAP estimation in Markov Random Fields

Geo-cuts

• Continuous functional

• Construct graph such that for smooth contours C

)interior(

||

0

)()(C

C

dVfdsgCE N

cut ingcorrespond theofcost )( CE

• Class of continuous functionals?

[Boykov&Kolmogorov’03], [Kolmogorov&Boykov’05]:

– Geometric length/area (e.g. Riemannian)

– Flux of a given vector field

– Regional term

Page 61: Algorithms for MAP estimation in Markov Random Fields
Page 62: Algorithms for MAP estimation in Markov Random Fields

TRW formulation

subject to

max)()(

T

TTT

TTθ

where is the collection of all parameter vectors

is a fixed probability distribution on trees T

T θ

T

Page 63: Algorithms for MAP estimation in Markov Random Fields

Efficient implementation

• Algorithm:– Forward pass:

• process nodes in the increasing order

• pass messages from lower neighbours

– Backward pass:• do the same in reverse order

• Linear running time of one iteration

4 5 6

7 8 9

1 2 3

node being

processed

valid

messages

Page 64: Algorithms for MAP estimation in Markov Random Fields

Efficient implementation

4 5 6

7 8 9

1 2 3

• Algorithm:– Forward pass:

• process nodes in the increasing order

• pass messages from lower neighbours

– Backward pass:• do the same in reverse order

• Linear running time of one iteration

Page 65: Algorithms for MAP estimation in Markov Random Fields

Efficient implementation

• Algorithm:– Forward pass:

• process nodes in the increasing order

• pass messages from lower neighbours

– Backward pass:• do the same in reverse order

• Linear running time of one iteration

4 5 6

7 8 9

1 2 3

node being

processed

valid

messages

Page 66: Algorithms for MAP estimation in Markov Random Fields

Efficient implementation

4 5 6

7 8 9

1 2 3

valid

messages

node being

processed

• Algorithm:– Forward pass:

• process nodes in the increasing order

• pass messages from lower neighbours

– Backward pass:• do the same in reverse order

• Linear running time of one iteration

Page 67: Algorithms for MAP estimation in Markov Random Fields

Efficient implementation

4 5 6

7 8 9

1 2 3

node being

processed

valid

messages

• Algorithm:– Forward pass:

• process nodes in the increasing order

• pass messages from lower neighbours

– Backward pass:• do the same in reverse order

• Linear running time of one iteration