cse291talk

8/7/2019 cse291talk

1/39

10/11/2001 Random walks and spectral segmentation 1

CSE 291 Fall 2001

Marina Meila and Jianbo Shi:

Learning Segmentation by RandomWalks/A Random Walks View of SpectralSegmentation

Markus HerrgardUCSD Bioengineering and Bioinformatics

8/7/2019 cse291talk

2/39

CSE 291 Fall 2001 10/11/2001Random walks and spectral segmentation

2

Overview

Introduction: Why random walks?

Review of the Ncut algorithm

Finite Markov chains

Spectral properties of Markov chains

Conductance of a Markov chain

Block-stochastic matrices

Application: Supervised segmentation

8/7/2019 cse291talk

3/39


3

Introduction

Why bother with mapping a segmentationproblem to a random walk problem?

Utilize strong connections between:

Graph theory

Theory of stochastic processes

Matrix algebra

8/7/2019 cse291talk

4/39


4

Applications of random walks

Markov chain monte carlo: Approximate high dimensional integration

e.g. in Bayesian inference

How to sample efficiently from a complexdistribution?

Randomized algorithms: Approximate counting in high dimensional

spaces How to sample points efficiently inside a

convex polytope?

8/7/2019 cse291talk

5/39


5

Segmentation as graph partitioning

Consider an image Iwith a similarityfunction Sij between all pairs of pixels i,jI

RepresentSas graph G =(I,S):

Pixels are the nodes of the graph

Sij is the weight of the edge between nodes iand j

Degree of node i: di = jSij Volume of setAI: volA= iAdi

8/7/2019 cse291talk

6/39


6

Simple example

Data with both distanceand color cues

Similarity matrix

8/7/2019 cse291talk

7/39


7

The normalized cut criterion

Partitioning ofG into A and itscomplement is found by minimizing thenormalized cut criterion:

! AjAi ijS

AvolvolAAANcut

,

11),(

Produces more balanced partitions thanregular graph cut

Approximate solution can be foundthrough spectral methods

8/7/2019 cse291talk

8/39


8

The normalized cut algorithm

Define: Diagonal matrix D with Dii = di Laplacian of the graph G: L = D S

Solve the generalized eigenvalueproblem: Lx = PDx

LetxL be the eigenvector correspondingto 2nd smallest eigenvalue PL

Partition xL to two sets containing roughlyequal values graph partition

8/7/2019 cse291talk

9/39


9

What does this actually mean?

Spectral methods are easy to apply, butnotoriously hard to understand intuitively

Some questions:

Why does it work? (see Shi & Malik)

Why this particular eigenvector?

Why would xL be piecewise constant?

What if there are more than two segments? What ifxL is not piecewise constant? (see

Kannan, Vempala & Vetta)

8/7/2019 cse291talk

10/39


10

Interlude: Finite Markov chains

Discrete time, finite state random process

State of the system at time tn: x

n

Probability of being in state i at time tn

given by:

Probability distribution for all statesrepresented by the column vector T(n)

Markov property:

)()( ixp nni !!

)|()|( 110 ! nnnn xxpxxxp -

8/7/2019 cse291talk

11/39


11

Transition matrix

Transition matrix:

Pis a (row) stochastic matrix:

Pij u 0

jPij = 1

If attn

the distribution is T(n) attn+1

the

distribution is given by:

)( 1 ixjxpP nnij !!!

)()1( nTn P !

8/7/2019 cse291talk

12/39


12

Example of a Markov chain

!

3/2

6/1

6/1

0

0

1

9.001.0

2.07.01.0

2.03.05.024

T

Play

Work

Sleep

!

2.0

3.0

5.0

0

0

1

9.001.0

2.07.01.0

2.03.05.0

SW P

T

8/7/2019 cse291talk

13/39


13

Some terminology

Stationary distribution Tg is given by:

Markov chain is reversible if the detailedbalance condition holds:

A reversible finite Markov chain is called a

random walk

gg! P T

jijiji PPgg

!

8/7/2019 cse291talk

14/39


14

Spectra of stochastic matrices

For reversible Markov chains theeigenvalues ofPare real andeigenvectors orthogonal

Spectral radius V(P) = 1 (i.e. |P|e1) Right (left) hand eigenvector

corresponding to P1=1 is x1=1 (x1=Tg)

8/7/2019 cse291talk

15/39


15

Back to Ncut

How is Ncut related to random walks ongraphs?

Transform the similarity matrix Sto a

stochastic matrix:

Pij is the probability of moving from pixel i

to pixel j in the graph representation ofthe image in one step of a random walk

SDP 1!

8/7/2019 cse291talk

16/39


16

Relationship to random walks

Spectrum of P: The generalized eigenvalue problem in

Ncut can be written as:

How are the spectra related? Same eigenvectors: x =xP

Eigenvalues: P = 1-PP

xxPI

xxSDIxDxSD

!

!!

)(

)()(1

PPP xPx !

8/7/2019 cse291talk

17/39


17

Simple example

Similarity matrix S

Transition matrix P=D-1S

8/7/2019 cse291talk

18/39


18

Eigenvalues and eigenvectors ofP

8/7/2019 cse291talk

19/39


19

Why the second eigenvector?

The smallest eigenvalue in NCutcorresponds to the largest eigenvalue ofP

The corresponding eigenvector x1=1 has

no information about partitioning

8/7/2019 cse291talk

20/39

8/7/2019 cse291talk

21/39


21

Conductance and the Ncut criterion

Assume that the random walk started fromits stationary distribution

Using this and Pij = Sij/di we can write:

I

d

i

i vol!

g

A

S

I

dd

S

I

d

P

AAjAi

ij

Ai

i

AjAi i

iji

Ai

i

AjAi

iji

vol

vol

vol)(,,,

g

g

!!!

8/7/2019 cse291talk

22/39


22

Interpretation of the Ncut criterion

Alternative representation of the Ncutcriterion:

Minimum NCut is equivalent to minimizing the conductance between setA

and its complement minimizing the probability of moving between

setA and its complement

)()(),(

,,AA

Avol

S

volA

S

AANcutAjAi

ij

AjAi

ij

!!

8/7/2019 cse291talk

23/39


23

Block-stochastic matrices

Let( = (A1,A2,,Ak) be a partition ofI

Pis a block-stochastic matrix orequivalently the Markov chain is

aggregatable iff

ks,s',AiRPPsss

Aj

ijis

s

-1'''

!!!

8/7/2019 cse291talk

24/39


24

Aggregation

Markov chain defined by Pwith statespace iIcan be aggregated to a Markovchain with a smaller state space A

s( and

a transition matrix R The keigenvalues ofR are the same as

the klargest eigenvalues ofP

Aggregation can be performed as a lineartransformation R = UPV

8/7/2019 cse291talk

25/39


25

Aggregation example

Aggregated transitionmatrix R

Transition matrix P

8/7/2019 cse291talk

26/39


26

Why piecewise constant eigenvectors?

IfPis block-stochastic with kblocks thenits kfirst eigenvectors are piecewiseconstant

Ncut is exact for block-stochastic matricesin addition to block diagonal matrices

Ncut groups pixels by the similarity of

their transition probabilities to subsets ofI

8/7/2019 cse291talk

27/39


27

Block-stochastic matrix example

Transition matrix P

Piecewise constant

eigenvector x

8/7/2019 cse291talk

28/39


28

The modified Ncut algorithm

Finds ksegments in one pass

Requires that the keigenvalues ofR arelarger than the other n-kspurious

eigenvalues ofP1. Compute eigenvalues ofP

2. Selectklargest eigenvectors

3. Use k-means to obtain segmentationbased on the keigenvectors

8/7/2019 cse291talk

29/39


29

Supervised image segmentation

Training data: Based on a human-segmented image

define target probabilities

Features: Different criteria fqij q=1,,Q that measure

similarity between pixels i and j

AiAjA

AjPij

! for,,

,01

*

8/7/2019 cse291talk

30/39


30

Supervised segmentation criterion

Model: Parametrized similarity function:

Optimization criterion: Minimize Kullback-Leibler divergence between

target transition matrix P* and P(E)=D-1S(E)

Corresponds to maximizing cross-entropy:

!q

q

ijqQij fS )exp(),,( 1 -

)(log1

)( *

!Ii Ij

ijij PPI

J

8/7/2019 cse291talk

31/39


31

Supervised segmentation algorithm

This can be done by using gradient ascentin E:

where

)(

)()1(

nq

n

q

n

q

J

!

x

x!

? A !xx

!ij

qij

n

ijij

q

fPPI

J

n

)(1

)(*

)(

8/7/2019 cse291talk

32/39


32

Toy example

Distance2

2

1

jiijf xx !

Color (or intensity)2

2

2

jiij ccf !

Training segmentation 1 (by distance): E1=-1.19,E2=1.04

Training segmentation 2 (by color):E1=-0.19,E2=-4.55

8/7/2019 cse291talk

33/39


33

Toy example results

Test data Training segmentation 1 (by distance):

Training segmentation 2 (by color):

8/7/2019 cse291talk

34/39


34

Application real image segmentation

Cues: Intervening contour:

Edge flow:

)(Edgemax ),( kf jilkIC

ij !

)cos(1

)2cos(2

)cos(1

)2cos()2cos(2

o

ji

l

jiC

ij

f

!

8/7/2019 cse291talk

35/39


35

Training

8/7/2019 cse291talk

36/39


36

Testing

8/7/2019 cse291talk

37/39


37

Conclusions I

Random walks perspective provides newinsights to the Ncut algorithm:

Relating the Ncut algorithm to spectral

properties of random walks Interpreting of the Ncut criterion in terms of

conductance of a random walk

Proving that Ncut is exact for block stochastic

matrices

8/7/2019 cse291talk

38/39


38

Conclusions II

Is any of this useful in practice? Supervised segmentation method

Comparing different spectral clustering

methods in terms of the underlying randomwalks

Choosing the kernel to allow for effectiveclustering (approximately block-stochastic)

New clustering criteria, e.g. bipartiteclustering

8/7/2019 cse291talk

39/39


39

References

Kemeny JG, Snell JL: Finite Markov Chains.Springer 1976.

Stewart WJ: Introduction to the NumericalSolution of Markov Chains. Princeton UniversityPress 1994.

Lovasz L: Random Walks of Graphs: A Survey.

Jerrum M, Sinclair A: The Markov Chain Monte

Carlo Method: An Approach to ApproximateCounting and Integration.

cse291talk

Documents