MSRI Introductory Worksh op / Jan. 28, 2005 Joachim M. Buhmann Learning and Learning and Image Segmentation Image Segmentation Joachim M. Buhmann Joachim M. Buhmann Institute for Computational Science ETH Zürich
Jan 01, 2016
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Learning and Learning and Image SegmentationImage Segmentation
Joachim M. Buhmann Joachim M. Buhmann
Institute for Computational Science
ETH Zürich
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Thanks to ...
Former PhD students:
Volker Roth, Thomas Zöller,
Jan Puzicha, Thomas Hofmann,
Current PhD students:
Bernd Fischer,
Tilman Lange,
Björn Ommer,
Peter Orbanz
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Short survey of clustering techniques for segmentation
The learning problem in segmentation
Cluster validation by stability analysis
Structure preserving embedding with examples from image segmentation in remote sensing
Roadmap of this Seminar Talk
Short survey of clustering techniques for segmentation
The learning problem in segmentation
Cluster validation by stability analysis
Structure preserving embedding with examples from image segmentation in remote sensing
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
What is Segmentation?
Image segmentation problem: Decompose an image into segments, i. e. regions containing similar features (pixels, frequencies, …).
Example: Segments might be regions of the image depicting the same object.
Semantics Problem: How should we infer objects from segments?
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Problem Formalization
Data: Object set V={o1,…,on} represented by data X (e.g. proximities, coordinates, histograms,…)
Clusters: C1,…, Ck partition object set
Encoding Clusterings: Use assignment function m: V {1,…,k}n determined by a clustering algorithm
Clustering solutions are defined by instances of the assignment function m.
®(X)i = m(i) , oi 2 Cm(i)
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
– Path Based Clustering
Grouping/Segmentation Principles
Connectedness criterion– Single Linkage
Compactness criterion– K-Means Clustering
– Pairwise Clustering, Average Association
– Max-Cut, Average Cut
– Normalized Cut
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Problems in Clustering
Clustering algorithms impose structure on data
a) Inappropriate model order b) Inappropriate model type.
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Learning Issues in Classification
1. Train a classifier on i.i.d. training data
2. Test trained classifier on new data
=> Good classifiers generalize well !
Control complexity by regularization (CV)underfitting optimal overfitting
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
How to Generalize in Clustering?
Example: k-means
1. Train a set of prototypes y on training data
2. Cluster new data w.r.t. the trained prototypes using the nearest neighbor rule
How can we control generalization?
Approximate the prototypes by probabilistic assignments, e.g., deterministic annealing
mtest = argminm
R km(m;Y train;X )
R km(m;Y ;X ) =P
i· n kxi ¡ ym(i)k2
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
K-means Grouping of LANDSAT-Data
6 RXxi :Data Vectorial
k 13
km ,...,1: 6 R
R km(m;Y;X ) =X
i· n
kxi ¡ ym(i)k2
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Spatially Smooth k-Means
Costs = k means costs + smoothness penalty
Learning: minimize costs to determine “smooth” centroids
Generalization: What should we transfer? Centroids? Centroids + corrections?
R skm(m;Y ;X ) =X
i· n
³kxi ¡ ym(i)k2 +¸
X
j 2N (i)
I f m(i) == m(j )g
´
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
color image encoded by histograms nj|
i for all
sites i
mixture of Gaussians
Color Segmentation by Mixtures
m(i)
assignments m(i) (to be learned)
segmented image
mixing coefficients p|m(i) (to be
learned)
nj|i
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Cost Function of Parametric Distributional Clustering (PDC)
Local color information encoded by histogram
nj|i (i = site index, j = index of color bin)
Cost function = negative log-likelihood:
jGpq imimj ||
ypmrypmi
in ,,,, 1R imimi pqnD log|| |.|.KL
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Extension of PDC Segmentation
Local smoothness of labels
Learning procedure: train mixture parameters coefficients on test images
Generalization:
R sPDC(m;q;fn:jig) = 1n
X
i· n
³DK L
¡n:jikq:jm(i)
¢¡ logpm(i)
´
+¸ 1n
X
i· n
X
j 2N (i)
I f m(i) == m(j )g
mtest = argminm
R sPDC(m;qtrain;fntest:ji g)
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Remote Sensing Imagery
Original Segmentation (16 clusters)colors indicate cluster assignments
Polarimetric SAR (synthetic aperture radar) image, L-band
Sampled image
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Generalization Experiments
segmentation 1segmentation 2transfer 1->2
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Generalization Experiments
segmentation 1 transfer 1->2segmentation 2
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
How to Learn a Segmentation using NCut
NCut: Minimize the cut, while maximize the association
What can we learn from one NCut solution to segment a 2nd image better? What to transfer?
H NCut (A;B) =cut(A;B)
assoc(A;V)+
cut(B;A)assoc(B;V)
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Short survey of clustering techniques for segmentation
The learning problem in segmentation
Cluster validation by stability analysis
Structure preserving embedding with examples from image segmentation in remote sensing
Roadmap
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Validation Methods ...• ... are procedures and concepts for the
quantitative and objective assessment of clustering solutions.
• ... evaluate a specific quality measure.
• ... can be external (= cmp. with ground-truth) or internal.
• ... can be used for model selection.Important Question:
What is the appropriate number of clusters k for my data?
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Stability-based Validation
• Many validation methods incorporate a structural bias!
• What to do if no additional prior knowledge available?
Main idea:
Stability:Solutions on two data sets from the same source
should be similar.
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
second set
Stable Solution
second setfirst set
Conclusion: If the model (order) is appropriate for the data then groupings on different data from the same source sets are similar with high probability.
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Requirement: An algorithm generalizes well if a solution on one data set is similar to one on a second data set (empirical classification risk).
Conclusion: If the model (order) is not appropriate for the data then groupings on different data from the same source sets are dissimilar with high probability.
Unstable Solution
first set second setsecond set
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Two Sample ScenarioGeneral procedure:
1. Draw 2 data sets X, X’ from the same source.
2. Cluster both data sets using algorithm (X).
3. Compute agreement
Stability := expected agreement of solutions
• In practical applications: only one data set X is available (i.e. one image).– Estimate expected agreement by resampling.
– Cluster entire data set with optimal k.
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Measuring disagreement
Two labelings on one data set:
disagreement := fraction of differently labeled objects
Three problems:
1.Clustering solutions are labelings of disjoint sets.
2.Labeling is unique only up to permutation S
k.
3.Fraction of differently labeled points is sensitive to model complexity:
50% @ k = 2 -> totally random,50% @ k = 10 -> often acceptable.
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Labeling of Disjoint Sets
Extend solution from set X to X’ by
(i) training a predictor on X
(ii) predicting labels on X’
(iii) compare clusterings on X’
• Choose predictor according to cluster principle.
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Predictor Selection
• Nearest neighbor predictor: assign X’i
to cluster (X)i with
• Replacement predictor for cost-based
clustering principles: label X’i from second
data set with the label of the most similar object in the first data set. Similarity is measured in terms of costs.
Optimal predictor will not produce trivial stability (caused by common fluctuations).
X i = argminX 2
d(X 0i ;X )
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Breaking Permutation Symmetry
Labeling unique only up to
S k
• Solution: Stability index S := expected minimal disagreement over
all S k.
• Hungarian method O(k3).
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Normalize for Different k Values
• Stability costs are scale-sensitive to k.
• Solution: Normalize by maximal stability costs:
• For the random predictor it holds:
Sk(®) · 1¡ 1=k
Sk(½) ! 1¡ 1=k as n ! 1
à NormalizeS 7!Sk(®)Sk(½)
.
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Complete Stability Measure
• Disagreement rate
• Permutation symmetry breaking
• Expectation w.r.t. two samples from same source
• Normalize by S().
• Estimate EX,X’ by resampling
1S(½)
EX ;X 0
Ã
min¼
1n
nX
i=1
I f®(X 0)i 6= ¼±g(X 0i ;X ;®(X ))g
!
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Results on Toy Data
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Clustering of Microarray Data
(dataset from Golub et al., Science, Oct. 1999, pp.531-537)
Task: Find groups of different Leukemia tumour samples (two- and three class classifications are
known).Problem: Number of groups is unknown a priori.
Result: 3-means solution
recovers 91% of known sample classifications.
3-means grouping of Golub et al. data set and estimated instability
Via Stability with k-means: Estimated number of groups is 3.
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Short survey of clustering techniques for segmentation
The learning problem in segmentation
Cluster validation by stability analysis
Structure preserving embedding with examples from image segmentation in remote sensing
Roadmap
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Structure Preserving Embedding
1. MDS: embedding should reproduce the dissimilarities as Euclidean distances
2. Manifold learning: ISOMAP, LLE, Hessian eigen-map, PSOMs, …
3. Structure preserving embedding: embed proximity data such that the cluster structure is preserved
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Pairwise Clustering for Segmentation
empiricalfeature distribution
test statistics(2, KS, JS Div.)
Di j
Quality of Segmentation: compactness
k ji
ijDDm
E
EVR,
1pc );(
VE
V
jii,j
imi
,:
)(:
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Objects in the
same cluster
Measuring ConnectednessObjects in
different clusters
Effective pairwise dissimilarity is the largest weight on the optimal path between two objects.
Exists a path
with small
dissimilarity
All paths
have at least one
high dissimilarity
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Clustering Based on Connectedness
kν
ννpbc (m)DD)(m; VR
Minimize objective function
with mean dissimilarity
ν2
ν,
eff1 (m)(m)DV Vji ijD
Pij(m) : set of all paths from oi to oj within cluster
1
1mP
eff max min(m)ν
hphpp1,...,hp
ij DDij
and effective pairwise dissimilarity
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Optimization of PBC
Agglomerative optimization combined with resampling with replacements
Automatic selection of model complexity, i.e., number of segments by fluctuation analysis
Too many evaluationsof objectivefunction
Simulated AnnealingIterated Conditional ModeMultiscale Optimization
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Color SegmentationHistogramClustering
Path BasedClustering
ClusterAssignmentProbabilities
InputImage
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Images of Corel Databaseh
an
dse
gm
en
tati
on
PB
Cass
ign
men
t p
rob
ab
iliti
es
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Comparison on Corel Database
Test on 700 human segmentations of 100 different images of the Corel database
Human/Human Human/Normalized Cut
Mean*: 7 % Mean*: 22 %
*[Martin et al., ICCV ‘01]
Human/Path Based Clustering
Mean: 13,5 %
Human / most stablePath Based Clustering
Mean: 12,4 %
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Pairwise Clustering as k-means
Clustering in Feature SpaceDecompose centered distance matrix D as
Dij=Sii + Sjj - 2Sij with arbitrary diagonal elements Sii .
Compute eigenvalues of S: 1 2 ...n
If n0, then S is a Mercer kernel: Sij = i .j
Otherwise update Sij Sij + |n |I . Then n = 0
Transformed distances: D’ij =Dij+ 2 n (1-ij)
Clusterings and their statistics remain invariant!R pc(m;D0) = R pc(m;D) +const:
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Example of Path Based Embedding
Input datasetgray value decodes distance from elipse
Embedding ofeffective dissimilarity
gray values kept fixed
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
1 dimension2 dimensions
PRODOM protein sequences
1500 domain sequences from PRODOM
Eigenvectors of shifted matrix S computed from only 10% of all sequences
10 dimensions20 dimensions
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Experiment for Fold Prediction
(373869 unlabeled data)(labeled data,2000 domains)
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Solution with 17 Clusters
Approximately 94% correctly identified fold labels
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Multidimensional Scaling (visualisation of proximity data)
embeddingLondon
Brighton
OxfordExeter
Aberystwyth
Liverpool
Strathclyde
Glasgow
Edinburgh
Nottingham
Newcastle
Inverness
Travel times between 12 British cities
Ab Br Ed Ex Gl In Li Lo Ne No Ox StAberystwyth Ab 0 300 352 217 336 515 138 251 292 206 202 369Brighton Br 300 0 466 238 451 638 271 88 349 198 122 483Edinburgh Ed 352 466 0 431 47 180 229 401 139 310 378 115Exeter Ex 217 238 431 0 415 595 236 189 371 211 157 448Glasgow Gl 336 451 47 415 0 190 214 386 169 295 362 108Inverness In 515 638 180 595 190 0 393 565 316 474 542 299Liverpool Li 138 271 229 236 214 393 0 206 180 130 183 246London Lo 251 88 401 189 389 565 206 0 284 133 67 418Newcastle Ne 292 349 139 371 169 316 180 284 0 165 268 202Nottingham No 206 198 310 211 295 474 130 133 165 0 117 327Oxford Ox 202 122 378 157 362 542 183 67 268 117 0 394Strathclyde St 369 483 155 448 108 299 246 418 202 327 394 0
MDS
Optimization problem:Approximate travel times by distances in the plane.
nji
ijjiij Dw,
2sstress );( xxDXR
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
MDS “Dimensionality Stress”
Evolution of ring structure with increasing dimensionality d of the object space. The points are Gaussian distributed in dimension (a) d=5, (b) d=30 and (c) d=100.
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Visualization of “Face Space”
Similarity betweenfaces are calculatedbyelastic graph matching.
Embedding by Sammons mapping
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Summary & Perspectives
Learning and generalization in vision refers to the general problem of robust optimization!
There exist challenges for unsupervised learning in vision which are conceptually (much) harder than in supervised learning.
Question of statistical vs. computational complexity.
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
„Neural“ Network Architecture
complete bipartite graph between feature and membership neurons
Lateral inhibition
¼c(i) = 1?
¼c(i) = 2?
weights Stimuli from/to neighboring
sites
Same basic structure for each image site; lateral excitation & inhibition for smoothing
n1ji
nj ji
nf jiw(i)
j ;º ¼logqj jº
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Generate spike with prob.
Neural Dynamics in a Site
§
§Spike generato
rs
Basis: Leaky Integrate-and-Fire (I&F) Neuron Model
R
R
Determine firing rate by integration. Winner neuron determines class membership
time
time
time
time
dsk
dt(t) = ¡
sk(t)¿
+X
l2¡ k
wlk(t)yl(t)
| {z }= activation
» nj ji
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Segmentation by I&F-NN
Network dynamic
s
Differences
Bayes inference
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
Dynamics of I&F NN
MSRI Introductory Workshop / Jan. 28, 2005
Joachim M. Buhmann
De-Synchronization of I&F-NN
Average activity pattern of class-membership neurons (k=5)