Em

,

PROBABILISTIC SEGMENTATION

IIT Kharagpur

Computer Science and Engineering,Indian Institute of Technology

Kharagpur.

1 / 36

,

Mixture Model Image SegmentationProbability of generating a pixel measurement vector:

p (x) =∑

l

p (x |θl) πl

The mixture model has the form:

p (x |Θ) =

g∑l=1

αl pl (x | θl )

Component densities:

pl (x | θl ) =1

(2π)d/2 det(Σl)1/2

exp{−

12

(x − µl

) >Σ−1

l

(x − µl

) }

2 / 36

,


p (x) =∑

l

p (x |θl) πl


p (x |Θ) =

g∑l=1

αl pl (x | θl )


pl (x | θl ) =1

(2π)d/2 det(Σl)1/2

exp{−

12

(x − µl

) >Σ−1

l

(x − µl

) }

2 / 36

,


p (x) =∑

l

p (x |θl) πl


p (x |Θ) =

g∑l=1

αl pl (x | θl )


pl (x | θl ) =1

(2π)d/2 det(Σl)1/2

exp{−

12

(x − µl

) >Σ−1

l

(x − µl

) }

2 / 36

,

Image SegmentationLikelihood for all observations (data points):

∏j∈ observations

g∑l=1

αl pl

(xj | θl

)

3 / 36

,

Mixture Model Line Fitting

p (W) =∑

l

πl p (W | al)

Likelihood for a set of observations:∏j∈ observations

g∑

l=1

πl pl

(Wj | al

)

4 / 36

,

Mixture Model Line Fitting

p (W) =∑

l

πl p (W | al)

Likelihood for a set of observations:∏j∈ observations

g∑

l=1

πl pl

(Wj | al

)

4 / 36

,

Missing data problems

Lc(x ; u) = log

∏j

pc

(xj ; u

) =

∑j

log(pc

(xj ; u

) )

The incomplete data space:

pi (y ; u) =

∫

(x | f (x)=y)

pc (x ; u)

dη

where η measures volume on the space of x such that f (x) = y

5 / 36

,


Lc(x ; u) = log

∏j

pc

(xj ; u

) =

∑j

log(pc

(xj ; u

) )The incomplete data space:

pi (y ; u) =

∫

(x | f (x)=y)

pc (x ; u)

dη


5 / 36

,


Lc(x ; u) = log

∏j

pc

(xj ; u

) =

∑j

log(pc

(xj ; u


pi (y ; u) =

∫

(x | f (x)=y)

pc (x ; u)

dη


5 / 36

,


Lc(x ; u) = log

∏j

pc

(xj ; u

) =

∑j

log(pc

(xj ; u


pi (y ; u) =

∫(x | f (x)=y)

pc (x ; u)

dη


5 / 36

,


Lc(x ; u) = log

∏j

pc

(xj ; u

) =

∑j

log(pc

(xj ; u


pi (y ; u) =

∫(x | f (x)=y)

pc (x ; u) dη


5 / 36

,

Missing data problemsThe incomplete data likelihood:∏

j∈ observations

pi

(yj ; u

)

Li(y ; u)

= log

∏j

pi

(yj ; u

) =

∑j

log(pi

(yj ; u

) )=

∑j

log

∫{x | f (x)=yj}

pc (x ; u) dη

6 / 36

,


j∈ observations

pi

(yj ; u

)

Li(y ; u)

= log

∏j

pi

(yj ; u

) =

∑j

log(pi

(yj ; u

) )=

∑j

log

∫{x | f (x)=yj}

pc (x ; u) dη

6 / 36

,


j∈ observations

pi

(yj ; u

)

Li(y ; u) = log

∏j

pi

(yj ; u

)

=∑

j

log(pi

(yj ; u

) )=

∑j

log

∫{x | f (x)=yj}

pc (x ; u) dη

6 / 36

,


j∈ observations

pi

(yj ; u

)

Li(y ; u) = log

∏j

pi

(yj ; u

) =

∑j

log(pi

(yj ; u

) )

=∑

j

log

∫{x | f (x)=yj}

pc (x ; u) dη

6 / 36

,


j∈ observations

pi

(yj ; u

)

Li(y ; u) = log

∏j

pi

(yj ; u

) =

∑j

log(pi

(yj ; u

) )=

∑j

log

∫{x | f (x)=yj}

pc (x ; u) dη

6 / 36

,

EM for mixture modelsThe complete data is a composition of the incomplete data and themissing data.

xj =[yj, zj

]

Mixture model:p (y) =

∑i

πl p (y | al)

Complete data log likelihood:

∑j∈ observations

g∑

l=1

zljlog p(yj | al

)

7 / 36

,


xj =[yj, zj

]Mixture model:

p (y) =∑

i

πl p (y | al)



g∑

l=1

zljlog p(yj | al

)

7 / 36

,


xj =[yj, zj

]Mixture model:

p (y) =∑

i

πl p (y | al)



g∑

l=1

zljlog p(yj | al

)

7 / 36

,

EME-step: Compute the expected value for zj for each j.i.e. Compute z(s)

j . This results in xs = [y, zs]

M-step: Maximize the complete data log-likelihoodwith respect to u

us+1 = arg maxu

Lc(xs ; u)

= arg maxu

Lc([y, zs] ; u)

8 / 36

,




us+1 = arg maxu

Lc(xs ; u)

= arg maxu

Lc([y, zs] ; u)

8 / 36

,




us+1 = arg maxu

Lc(xs ; u)

= arg maxu

Lc([y, zs] ; u)

8 / 36

,

EM in General CaseExpected value of the complete data log-likelihood:

Q(u ; u(s)

)=

∫Lc(x ; u) p

(x |u(s), y

)dx

We maximize with respect to u to get.

us+1 = arg maxu

Q(u ; u(s)

)

9 / 36

,

Image SegmentationWHAT IS MISSING DATA? An (n × g) matrix I of indicator variables.

Expectation step:

E(Ilm) = Ilm =

1 · P(

l th pixel comes from mth blob)

+ 0 · P(

l th pixel does not come from mth blob)

= P(


We get:

Ilm =α(s)

m pm

(xl |θ

(s)m

)∑K

k=1 α(s)k pk

(xl |θ

(s)k

)

10 / 36

,


Expectation step:

E(Ilm) = Ilm = 1 · P(


+ 0 · P(


= P(


We get:

Ilm =α(s)

m pm

(xl |θ

(s)m

)∑K

k=1 α(s)k pk

(xl |θ

(s)k

)

10 / 36

,


Expectation step:

E(Ilm) = Ilm = 1 · P(


+ 0 · P(


= P(


We get:

Ilm =α(s)

m pm

(xl |θ

(s)m

)∑K

k=1 α(s)k pk

(xl |θ

(s)k

)

10 / 36

,


Expectation step:

E(Ilm) = Ilm = 1 · P(


+ 0 · P(


= P(


We get:

Ilm =α(s)

m pm

(xl |θ

(s)m

)∑K

k=1 α(s)k pk

(xl |θ

(s)k

)

10 / 36

,

Image SegmentationCOMPLETE DATA LOG-LIKELIHOOD:

Lc

([x, Ilm] ; Θ(s)

)=

∑l∈ all pixel

g∑

m=1

Ilm log p (xl |θm)

Maximization step:

Θ(s+1) = arg maxΘ

Lc

([x, Ilm] ; Θ(s)

)

11 / 36

,

Image SegmentationCOMPLETE DATA LOG-LIKELIHOOD:

Lc

([x, Ilm] ; Θ(s)

)=

∑l∈ all pixel

g∑

m=1

Ilm log p (xl |θm)

Maximization step:

Θ(s+1) = arg maxΘ

Lc

([x, Ilm] ; Θ(s)

)

11 / 36

,

Image SegmentationMaximization step:

α(s+1)m =

1n

n∑l=1

p(m | xl ,Θ

(s))

µ(s+1)m =

∑nl=1 xl p

(m | xl ,Θ

(s))

∑nl=1 p

(m | xl ,Θ

(s))

Σ(s+1)m =

∑nl=1 p

(m | xl ,Θ

(s)) {(

xl − µ(s)m

) (xl − µ

(s)m

)>}∑n

l=1 p(m | xl ,Θ

(s))

12 / 36

,

How EM works for Image SegmentationE-step:

Ilm =α(s)

m pm

(xl |θ

(s)m

)∑K

k=1 α(s)k pk

(xl |θ

(s)m

)For each pixel we compute the values: α(s)

m pm

(xl |θ

(s)m

)for each

segment m.

For each pixel compute the sum∑K

k=1 α(s)k pk

(xl |θ

(s)m

), i.e.

perform summation over all the K segments.Divide the former by the latter.

M-step:

Compute the α(s+1)m ,µ(s+1)

m ,Σ(s+1)m

13 / 36

,

Line Fitting Expectation Maximization

WHAT IS MISSING DATA?An (n × g) matrixM of indicator variables.

k, lth entry ofM = mk,l =

{1 if point k is drawn from line l0 otherwise

∑l

P (mkl = 1 | point k, line l′s parameters) = 1.

HOW TO FORMULATE LIKELIHOOD?

exp(−

(distance from point k to line l )2

2σ2

)

14 / 36

,




{1 if point k is drawn from line l0 otherwise∑

l



exp(−


2σ2

)

14 / 36

,





l



exp(−


2σ2

)

14 / 36

,





l



exp(−


2σ2

)

14 / 36

,

Motion Segmentation EM

WHAT IS MISSING DATA? It is the motion field to which the pixel lbelongs. Indicator variable Vxy,l is the xy, l th entry of V .

Vxy,l =

{1 if xy th pixel belongs to the l th motion field0 otherwise


L(V ,Θ) = −∑xy,l

Vxy,l(I1(x, y) − I2(x+m1(x, y ; θl ), y+m2(x, y ; θl )) ) 2

2σ2

where Θ =(θ1, θ2, . . . θg

)P

{Vxy,l = 1 ; I1, I2,Θ

}

15 / 36

,


WHAT IS MISSING DATA? It is the motion field to which the pixel lbelongs. Indicator variable Vxy,l is the xy, l th entry of V .

Vxy,l =

{1 if xy th pixel belongs to the l th motion field0 otherwise


L(V ,Θ) = −∑xy,l

Vxy,l(I1(x, y) − I2(x+m1(x, y ; θl ), y+m2(x, y ; θl )) ) 2

2σ2

where Θ =(θ1, θ2, . . . θg

)P

{Vxy,l = 1 ; I1, I2,Θ

}15 / 36

,



P{Vxy,l = 1 ; I1, I2,Θ

}A common choice is the affine motion model:{

m1m2

}(x, y ; θl ) =

{a11 a12a21 a22

}{xy

}{a13a23

}where θl = (a11,a12, . . . ,a23)

Layered representation

16 / 36

,

Identifying Outliers EM

We construct an explicit model of the outliers.

(1 − λ) P (measurements |model) + λ P (outliers)

Here λ = [0,1] models the frequency with which the outliersoccur,P (outliers) is the probability model for the outliers.

WHAT IS MISSING DATA?A variable that indicates which component generated each point.

Complete data likelihood∏j

((1 − λ) P

(measurementj |model

)+ λP

(measurementj |outliers

) )

17 / 36

,

Identifying Outliers EM

We construct an explicit model of the outliers.

(1 − λ) P (measurements |model) + λ P (outliers)

Here λ = [0,1] models the frequency with which the outliersoccur,P (outliers) is the probability model for the outliers.

WHAT IS MISSING DATA?A variable that indicates which component generated each point.Complete data likelihood∏

j

((1 − λ) P

(measurementj |model

)+ λP

(measurementj |outliers

) )

17 / 36

,

Background Subtraction EM

For each pixel we get a series of observations for the successiveframes.The source of these obeservations is a mixture model with twocomponents: the background and the noise (foreground).The background can be modeled as a Gaussian.The noise can come from some uniform source.Any pixel which belongs to noise is not background.

18 / 36

,

Difficulties Expectation MaximizationLocal minima.Proper initialization.Extremely small expected weights.Parameters converging to the boundaries of parameter space.

19 / 36

,

Model SelectionShould we consider minimizing the negative of log likelihood?

We should have a penalty term which increases as the number ofcomponents increase.

An Information Criteria (AIC)

−2L(x ; Θ∗) + 2p

where p is the number of free parameters.

Bayesian Information Criteria (BIC)

−L(D ;θ∗) +p2

log N


20 / 36

,

Model SelectionShould we consider minimizing the negative of log likelihood?We should have a penalty term which increases as the number ofcomponents increase.


−2L(x ; Θ∗) + 2p



−L(D ;θ∗) +p2

log N


20 / 36

,



−2L(x ; Θ∗) + 2p



−L(D ;θ∗) +p2

log N


20 / 36

,



−2L(x ; Θ∗) + 2p



−L(D ;θ∗) +p2

log N


20 / 36

,


P (M|D) =P (D|M)

P (D)P (M)

=

∫P (D|M , θ) P (θ) dθ

P (D)P (M)

Maximizing the posterior P (M|D) yields:

−L(D ;θ∗) +p2 log N


21 / 36

,

Minimum Description Length (MDL) criteriaIt yields a selection criteria which is the same as BIC.

−L(D ;θ∗) +p2

log N


22 / 36

,

23 / 36

,

24 / 36

,

25 / 36

,

26 / 36

,

27 / 36

,

28 / 36

,

29 / 36

,

30 / 36

,

31 / 36

,

32 / 36

,

33 / 36

,

34 / 36

,

35 / 36

,

36 / 36

Em

Documents