, P ROBABILISTIC S EGMENTATION IIT Kharagpur Computer Science and Engineering, Indian Institute of Technology Kharagpur. 1 / 36
,
PROBABILISTIC SEGMENTATION
IIT Kharagpur
Computer Science and Engineering,Indian Institute of Technology
Kharagpur.
1 / 36
,
Mixture Model Image SegmentationProbability of generating a pixel measurement vector:
p (x) =∑
l
p (x |θl) πl
The mixture model has the form:
p (x |Θ) =
g∑l=1
αl pl (x | θl )
Component densities:
pl (x | θl ) =1
(2π)d/2 det(Σl)1/2
exp{−
12
(x − µl
) >Σ−1
l
(x − µl
) }
2 / 36
,
Mixture Model Image SegmentationProbability of generating a pixel measurement vector:
p (x) =∑
l
p (x |θl) πl
The mixture model has the form:
p (x |Θ) =
g∑l=1
αl pl (x | θl )
Component densities:
pl (x | θl ) =1
(2π)d/2 det(Σl)1/2
exp{−
12
(x − µl
) >Σ−1
l
(x − µl
) }
2 / 36
,
Mixture Model Image SegmentationProbability of generating a pixel measurement vector:
p (x) =∑
l
p (x |θl) πl
The mixture model has the form:
p (x |Θ) =
g∑l=1
αl pl (x | θl )
Component densities:
pl (x | θl ) =1
(2π)d/2 det(Σl)1/2
exp{−
12
(x − µl
) >Σ−1
l
(x − µl
) }
2 / 36
,
Image SegmentationLikelihood for all observations (data points):
∏j∈ observations
g∑l=1
αl pl
(xj | θl
)
3 / 36
,
Mixture Model Line Fitting
p (W) =∑
l
πl p (W | al)
Likelihood for a set of observations:∏j∈ observations
g∑
l=1
πl pl
(Wj | al
)
4 / 36
,
Mixture Model Line Fitting
p (W) =∑
l
πl p (W | al)
Likelihood for a set of observations:∏j∈ observations
g∑
l=1
πl pl
(Wj | al
)
4 / 36
,
Missing data problems
Lc(x ; u) = log
∏j
pc
(xj ; u
) =
∑j
log(pc
(xj ; u
) )
The incomplete data space:
pi (y ; u) =
∫
(x | f (x)=y)
pc (x ; u)
dη
where η measures volume on the space of x such that f (x) = y
5 / 36
,
Missing data problems
Lc(x ; u) = log
∏j
pc
(xj ; u
) =
∑j
log(pc
(xj ; u
) )The incomplete data space:
pi (y ; u) =
∫
(x | f (x)=y)
pc (x ; u)
dη
where η measures volume on the space of x such that f (x) = y
5 / 36
,
Missing data problems
Lc(x ; u) = log
∏j
pc
(xj ; u
) =
∑j
log(pc
(xj ; u
) )The incomplete data space:
pi (y ; u) =
∫
(x | f (x)=y)
pc (x ; u)
dη
where η measures volume on the space of x such that f (x) = y
5 / 36
,
Missing data problems
Lc(x ; u) = log
∏j
pc
(xj ; u
) =
∑j
log(pc
(xj ; u
) )The incomplete data space:
pi (y ; u) =
∫(x | f (x)=y)
pc (x ; u)
dη
where η measures volume on the space of x such that f (x) = y
5 / 36
,
Missing data problems
Lc(x ; u) = log
∏j
pc
(xj ; u
) =
∑j
log(pc
(xj ; u
) )The incomplete data space:
pi (y ; u) =
∫(x | f (x)=y)
pc (x ; u) dη
where η measures volume on the space of x such that f (x) = y
5 / 36
,
Missing data problemsThe incomplete data likelihood:∏
j∈ observations
pi
(yj ; u
)
Li(y ; u)
= log
∏j
pi
(yj ; u
) =
∑j
log(pi
(yj ; u
) )=
∑j
log
∫{x | f (x)=yj}
pc (x ; u) dη
6 / 36
,
Missing data problemsThe incomplete data likelihood:∏
j∈ observations
pi
(yj ; u
)
Li(y ; u)
= log
∏j
pi
(yj ; u
) =
∑j
log(pi
(yj ; u
) )=
∑j
log
∫{x | f (x)=yj}
pc (x ; u) dη
6 / 36
,
Missing data problemsThe incomplete data likelihood:∏
j∈ observations
pi
(yj ; u
)
Li(y ; u) = log
∏j
pi
(yj ; u
)
=∑
j
log(pi
(yj ; u
) )=
∑j
log
∫{x | f (x)=yj}
pc (x ; u) dη
6 / 36
,
Missing data problemsThe incomplete data likelihood:∏
j∈ observations
pi
(yj ; u
)
Li(y ; u) = log
∏j
pi
(yj ; u
) =
∑j
log(pi
(yj ; u
) )
=∑
j
log
∫{x | f (x)=yj}
pc (x ; u) dη
6 / 36
,
Missing data problemsThe incomplete data likelihood:∏
j∈ observations
pi
(yj ; u
)
Li(y ; u) = log
∏j
pi
(yj ; u
) =
∑j
log(pi
(yj ; u
) )=
∑j
log
∫{x | f (x)=yj}
pc (x ; u) dη
6 / 36
,
EM for mixture modelsThe complete data is a composition of the incomplete data and themissing data.
xj =[yj, zj
]
Mixture model:p (y) =
∑i
πl p (y | al)
Complete data log likelihood:
∑j∈ observations
g∑
l=1
zljlog p(yj | al
)
7 / 36
,
EM for mixture modelsThe complete data is a composition of the incomplete data and themissing data.
xj =[yj, zj
]Mixture model:
p (y) =∑
i
πl p (y | al)
Complete data log likelihood:
∑j∈ observations
g∑
l=1
zljlog p(yj | al
)
7 / 36
,
EM for mixture modelsThe complete data is a composition of the incomplete data and themissing data.
xj =[yj, zj
]Mixture model:
p (y) =∑
i
πl p (y | al)
Complete data log likelihood:
∑j∈ observations
g∑
l=1
zljlog p(yj | al
)
7 / 36
,
EME-step: Compute the expected value for zj for each j.i.e. Compute z(s)
j . This results in xs = [y, zs]
M-step: Maximize the complete data log-likelihoodwith respect to u
us+1 = arg maxu
Lc(xs ; u)
= arg maxu
Lc([y, zs] ; u)
8 / 36
,
EME-step: Compute the expected value for zj for each j.i.e. Compute z(s)
j . This results in xs = [y, zs]
M-step: Maximize the complete data log-likelihoodwith respect to u
us+1 = arg maxu
Lc(xs ; u)
= arg maxu
Lc([y, zs] ; u)
8 / 36
,
EME-step: Compute the expected value for zj for each j.i.e. Compute z(s)
j . This results in xs = [y, zs]
M-step: Maximize the complete data log-likelihoodwith respect to u
us+1 = arg maxu
Lc(xs ; u)
= arg maxu
Lc([y, zs] ; u)
8 / 36
,
EM in General CaseExpected value of the complete data log-likelihood:
Q(u ; u(s)
)=
∫Lc(x ; u) p
(x |u(s), y
)dx
We maximize with respect to u to get.
us+1 = arg maxu
Q(u ; u(s)
)
9 / 36
,
Image SegmentationWHAT IS MISSING DATA? An (n × g) matrix I of indicator variables.
Expectation step:
E(Ilm) = Ilm =
1 · P(
l th pixel comes from mth blob)
+ 0 · P(
l th pixel does not come from mth blob)
= P(
l th pixel comes from mth blob)
We get:
Ilm =α(s)
m pm
(xl |θ
(s)m
)∑K
k=1 α(s)k pk
(xl |θ
(s)k
)
10 / 36
,
Image SegmentationWHAT IS MISSING DATA? An (n × g) matrix I of indicator variables.
Expectation step:
E(Ilm) = Ilm = 1 · P(
l th pixel comes from mth blob)
+ 0 · P(
l th pixel does not come from mth blob)
= P(
l th pixel comes from mth blob)
We get:
Ilm =α(s)
m pm
(xl |θ
(s)m
)∑K
k=1 α(s)k pk
(xl |θ
(s)k
)
10 / 36
,
Image SegmentationWHAT IS MISSING DATA? An (n × g) matrix I of indicator variables.
Expectation step:
E(Ilm) = Ilm = 1 · P(
l th pixel comes from mth blob)
+ 0 · P(
l th pixel does not come from mth blob)
= P(
l th pixel comes from mth blob)
We get:
Ilm =α(s)
m pm
(xl |θ
(s)m
)∑K
k=1 α(s)k pk
(xl |θ
(s)k
)
10 / 36
,
Image SegmentationWHAT IS MISSING DATA? An (n × g) matrix I of indicator variables.
Expectation step:
E(Ilm) = Ilm = 1 · P(
l th pixel comes from mth blob)
+ 0 · P(
l th pixel does not come from mth blob)
= P(
l th pixel comes from mth blob)
We get:
Ilm =α(s)
m pm
(xl |θ
(s)m
)∑K
k=1 α(s)k pk
(xl |θ
(s)k
)
10 / 36
,
Image SegmentationCOMPLETE DATA LOG-LIKELIHOOD:
Lc
([x, Ilm] ; Θ(s)
)=
∑l∈ all pixel
g∑
m=1
Ilm log p (xl |θm)
Maximization step:
Θ(s+1) = arg maxΘ
Lc
([x, Ilm] ; Θ(s)
)
11 / 36
,
Image SegmentationCOMPLETE DATA LOG-LIKELIHOOD:
Lc
([x, Ilm] ; Θ(s)
)=
∑l∈ all pixel
g∑
m=1
Ilm log p (xl |θm)
Maximization step:
Θ(s+1) = arg maxΘ
Lc
([x, Ilm] ; Θ(s)
)
11 / 36
,
Image SegmentationMaximization step:
α(s+1)m =
1n
n∑l=1
p(m | xl ,Θ
(s))
µ(s+1)m =
∑nl=1 xl p
(m | xl ,Θ
(s))
∑nl=1 p
(m | xl ,Θ
(s))
Σ(s+1)m =
∑nl=1 p
(m | xl ,Θ
(s)) {(
xl − µ(s)m
) (xl − µ
(s)m
)>}∑n
l=1 p(m | xl ,Θ
(s))
12 / 36
,
How EM works for Image SegmentationE-step:
Ilm =α(s)
m pm
(xl |θ
(s)m
)∑K
k=1 α(s)k pk
(xl |θ
(s)m
)For each pixel we compute the values: α(s)
m pm
(xl |θ
(s)m
)for each
segment m.
For each pixel compute the sum∑K
k=1 α(s)k pk
(xl |θ
(s)m
), i.e.
perform summation over all the K segments.Divide the former by the latter.
M-step:
Compute the α(s+1)m ,µ(s+1)
m ,Σ(s+1)m
13 / 36
,
Line Fitting Expectation Maximization
WHAT IS MISSING DATA?An (n × g) matrixM of indicator variables.
k, lth entry ofM = mk,l =
{1 if point k is drawn from line l0 otherwise
∑l
P (mkl = 1 | point k, line l′s parameters) = 1.
HOW TO FORMULATE LIKELIHOOD?
exp(−
(distance from point k to line l )2
2σ2
)
14 / 36
,
Line Fitting Expectation Maximization
WHAT IS MISSING DATA?An (n × g) matrixM of indicator variables.
k, lth entry ofM = mk,l =
{1 if point k is drawn from line l0 otherwise∑
l
P (mkl = 1 | point k, line l′s parameters) = 1.
HOW TO FORMULATE LIKELIHOOD?
exp(−
(distance from point k to line l )2
2σ2
)
14 / 36
,
Line Fitting Expectation Maximization
WHAT IS MISSING DATA?An (n × g) matrixM of indicator variables.
k, lth entry ofM = mk,l =
{1 if point k is drawn from line l0 otherwise∑
l
P (mkl = 1 | point k, line l′s parameters) = 1.
HOW TO FORMULATE LIKELIHOOD?
exp(−
(distance from point k to line l )2
2σ2
)
14 / 36
,
Line Fitting Expectation Maximization
WHAT IS MISSING DATA?An (n × g) matrixM of indicator variables.
k, lth entry ofM = mk,l =
{1 if point k is drawn from line l0 otherwise∑
l
P (mkl = 1 | point k, line l′s parameters) = 1.
HOW TO FORMULATE LIKELIHOOD?
exp(−
(distance from point k to line l )2
2σ2
)
14 / 36
,
Motion Segmentation EM
WHAT IS MISSING DATA? It is the motion field to which the pixel lbelongs. Indicator variable Vxy,l is the xy, l th entry of V .
Vxy,l =
{1 if xy th pixel belongs to the l th motion field0 otherwise
HOW TO FORMULATE LIKELIHOOD?
L(V ,Θ) = −∑xy,l
Vxy,l(I1(x, y) − I2(x+m1(x, y ; θl ), y+m2(x, y ; θl )) ) 2
2σ2
where Θ =(θ1, θ2, . . . θg
)P
{Vxy,l = 1 ; I1, I2,Θ
}
15 / 36
,
Motion Segmentation EM
WHAT IS MISSING DATA? It is the motion field to which the pixel lbelongs. Indicator variable Vxy,l is the xy, l th entry of V .
Vxy,l =
{1 if xy th pixel belongs to the l th motion field0 otherwise
HOW TO FORMULATE LIKELIHOOD?
L(V ,Θ) = −∑xy,l
Vxy,l(I1(x, y) − I2(x+m1(x, y ; θl ), y+m2(x, y ; θl )) ) 2
2σ2
where Θ =(θ1, θ2, . . . θg
)P
{Vxy,l = 1 ; I1, I2,Θ
}15 / 36
,
Motion Segmentation EM
HOW TO FORMULATE LIKELIHOOD?
P{Vxy,l = 1 ; I1, I2,Θ
}A common choice is the affine motion model:{
m1m2
}(x, y ; θl ) =
{a11 a12a21 a22
}{xy
}{a13a23
}where θl = (a11,a12, . . . ,a23)
Layered representation
16 / 36
,
Identifying Outliers EM
We construct an explicit model of the outliers.
(1 − λ) P (measurements |model) + λ P (outliers)
Here λ = [0,1] models the frequency with which the outliersoccur,P (outliers) is the probability model for the outliers.
WHAT IS MISSING DATA?A variable that indicates which component generated each point.
Complete data likelihood∏j
((1 − λ) P
(measurementj |model
)+ λP
(measurementj |outliers
) )
17 / 36
,
Identifying Outliers EM
We construct an explicit model of the outliers.
(1 − λ) P (measurements |model) + λ P (outliers)
Here λ = [0,1] models the frequency with which the outliersoccur,P (outliers) is the probability model for the outliers.
WHAT IS MISSING DATA?A variable that indicates which component generated each point.Complete data likelihood∏
j
((1 − λ) P
(measurementj |model
)+ λP
(measurementj |outliers
) )
17 / 36
,
Background Subtraction EM
For each pixel we get a series of observations for the successiveframes.The source of these obeservations is a mixture model with twocomponents: the background and the noise (foreground).The background can be modeled as a Gaussian.The noise can come from some uniform source.Any pixel which belongs to noise is not background.
18 / 36
,
Difficulties Expectation MaximizationLocal minima.Proper initialization.Extremely small expected weights.Parameters converging to the boundaries of parameter space.
19 / 36
,
Model SelectionShould we consider minimizing the negative of log likelihood?
We should have a penalty term which increases as the number ofcomponents increase.
An Information Criteria (AIC)
−2L(x ; Θ∗) + 2p
where p is the number of free parameters.
Bayesian Information Criteria (BIC)
−L(D ;θ∗) +p2
log N
where p is the number of free parameters.
20 / 36
,
Model SelectionShould we consider minimizing the negative of log likelihood?We should have a penalty term which increases as the number ofcomponents increase.
An Information Criteria (AIC)
−2L(x ; Θ∗) + 2p
where p is the number of free parameters.
Bayesian Information Criteria (BIC)
−L(D ;θ∗) +p2
log N
where p is the number of free parameters.
20 / 36
,
Model SelectionShould we consider minimizing the negative of log likelihood?We should have a penalty term which increases as the number ofcomponents increase.
An Information Criteria (AIC)
−2L(x ; Θ∗) + 2p
where p is the number of free parameters.
Bayesian Information Criteria (BIC)
−L(D ;θ∗) +p2
log N
where p is the number of free parameters.
20 / 36
,
Model SelectionShould we consider minimizing the negative of log likelihood?We should have a penalty term which increases as the number ofcomponents increase.
An Information Criteria (AIC)
−2L(x ; Θ∗) + 2p
where p is the number of free parameters.
Bayesian Information Criteria (BIC)
−L(D ;θ∗) +p2
log N
where p is the number of free parameters.
20 / 36
,
Bayesian Information Criteria (BIC)
P (M|D) =P (D|M)
P (D)P (M)
=
∫P (D|M , θ) P (θ) dθ
P (D)P (M)
Maximizing the posterior P (M|D) yields:
−L(D ;θ∗) +p2 log N
where p is the number of free parameters.
21 / 36
,
Minimum Description Length (MDL) criteriaIt yields a selection criteria which is the same as BIC.
−L(D ;θ∗) +p2
log N
where p is the number of free parameters.
22 / 36
,
23 / 36
,
24 / 36
,
25 / 36
,
26 / 36
,
27 / 36
,
28 / 36
,
29 / 36
,
30 / 36
,
31 / 36
,
32 / 36
,
33 / 36
,
34 / 36
,
35 / 36
,
36 / 36