MSRI Introductory Workshop / Jan. 28, 2005 Joachim M. Buhmann Learning and Image Segmentation Joachim M. Buhmann Institute for Computational Science ETH.

MSRI Introductory Workshop / Jan. 28, 2005

Joachim M. Buhmann

Learning and Learning and Image SegmentationImage Segmentation

Joachim M. Buhmann Joachim M. Buhmann

Institute for Computational Science

ETH Zürich


Joachim M. Buhmann

Thanks to ...

Former PhD students:

Volker Roth, Thomas Zöller,

Jan Puzicha, Thomas Hofmann,

Current PhD students:

Bernd Fischer,

Tilman Lange,

Björn Ommer,

Peter Orbanz


Joachim M. Buhmann

Short survey of clustering techniques for segmentation

The learning problem in segmentation

Cluster validation by stability analysis

Structure preserving embedding with examples from image segmentation in remote sensing

Roadmap of this Seminar Talk






Joachim M. Buhmann

What is Segmentation?

Image segmentation problem: Decompose an image into segments, i. e. regions containing similar features (pixels, frequencies, …).

Example: Segments might be regions of the image depicting the same object.

Semantics Problem: How should we infer objects from segments?


Joachim M. Buhmann

Problem Formalization

Data: Object set V={o1,…,on} represented by data X (e.g. proximities, coordinates, histograms,…)

Clusters: C1,…, Ck partition object set

Encoding Clusterings: Use assignment function m: V {1,…,k}n determined by a clustering algorithm

Clustering solutions are defined by instances of the assignment function m.

®(X)i = m(i) , oi 2 Cm(i)


Joachim M. Buhmann

– Path Based Clustering

Grouping/Segmentation Principles

Connectedness criterion– Single Linkage

Compactness criterion– K-Means Clustering

– Pairwise Clustering, Average Association

– Max-Cut, Average Cut

– Normalized Cut


Joachim M. Buhmann

Problems in Clustering

Clustering algorithms impose structure on data

a) Inappropriate model order b) Inappropriate model type.


Joachim M. Buhmann

Learning Issues in Classification

1. Train a classifier on i.i.d. training data

2. Test trained classifier on new data

=> Good classifiers generalize well !

Control complexity by regularization (CV)underfitting optimal overfitting


Joachim M. Buhmann

How to Generalize in Clustering?

Example: k-means

1. Train a set of prototypes y on training data

2. Cluster new data w.r.t. the trained prototypes using the nearest neighbor rule

How can we control generalization?

Approximate the prototypes by probabilistic assignments, e.g., deterministic annealing

mtest = argminm

R km(m;Y train;X )

R km(m;Y ;X ) =P

i· n kxi ¡ ym(i)k2


Joachim M. Buhmann

K-means Grouping of LANDSAT-Data

6 RXxi :Data Vectorial

k 13

km ,...,1: 6 R

R km(m;Y;X ) =X

i· n

kxi ¡ ym(i)k2


Joachim M. Buhmann

Spatially Smooth k-Means

Costs = k means costs + smoothness penalty

Learning: minimize costs to determine “smooth” centroids

Generalization: What should we transfer? Centroids? Centroids + corrections?

R skm(m;Y ;X ) =X

i· n

³kxi ¡ ym(i)k2 +¸

X

j 2N (i)

I f m(i) == m(j )g

´


Joachim M. Buhmann

color image encoded by histograms nj|

i for all

sites i

mixture of Gaussians

Color Segmentation by Mixtures

m(i)

assignments m(i) (to be learned)

segmented image

mixing coefficients p|m(i) (to be

learned)

nj|i


Joachim M. Buhmann

Cost Function of Parametric Distributional Clustering (PDC)

Local color information encoded by histogram

nj|i (i = site index, j = index of color bin)

Cost function = negative log-likelihood:

jGpq imimj ||

ypmrypmi

in ,,,, 1R imimi pqnD log|| |.|.KL


Joachim M. Buhmann

Extension of PDC Segmentation

Local smoothness of labels

Learning procedure: train mixture parameters coefficients on test images

Generalization:

R sPDC(m;q;fn:jig) = 1n

X

i· n

³DK L

¡n:jikq:jm(i)

¢¡ logpm(i)

´

+¸ 1n

X

i· n

X

j 2N (i)

I f m(i) == m(j )g

mtest = argminm

R sPDC(m;qtrain;fntest:ji g)


Joachim M. Buhmann

Remote Sensing Imagery

Original Segmentation (16 clusters)colors indicate cluster assignments

Polarimetric SAR (synthetic aperture radar) image, L-band

Sampled image


Joachim M. Buhmann

Generalization Experiments

segmentation 1segmentation 2transfer 1->2


Joachim M. Buhmann

Generalization Experiments

segmentation 1 transfer 1->2segmentation 2


Joachim M. Buhmann

How to Learn a Segmentation using NCut

NCut: Minimize the cut, while maximize the association

What can we learn from one NCut solution to segment a 2nd image better? What to transfer?

H NCut (A;B) =cut(A;B)

assoc(A;V)+

cut(B;A)assoc(B;V)


Joachim M. Buhmann





Roadmap


Joachim M. Buhmann

Validation Methods ...• ... are procedures and concepts for the

quantitative and objective assessment of clustering solutions.

• ... evaluate a specific quality measure.

• ... can be external (= cmp. with ground-truth) or internal.

• ... can be used for model selection.Important Question:

What is the appropriate number of clusters k for my data?


Joachim M. Buhmann

Stability-based Validation

• Many validation methods incorporate a structural bias!

• What to do if no additional prior knowledge available?

Main idea:

Stability:Solutions on two data sets from the same source

should be similar.


Joachim M. Buhmann

second set

Stable Solution

second setfirst set

Conclusion: If the model (order) is appropriate for the data then groupings on different data from the same source sets are similar with high probability.


Joachim M. Buhmann

Requirement: An algorithm generalizes well if a solution on one data set is similar to one on a second data set (empirical classification risk).

Conclusion: If the model (order) is not appropriate for the data then groupings on different data from the same source sets are dissimilar with high probability.

Unstable Solution

first set second setsecond set


Joachim M. Buhmann

Two Sample ScenarioGeneral procedure:

1. Draw 2 data sets X, X’ from the same source.

2. Cluster both data sets using algorithm (X).

3. Compute agreement

Stability := expected agreement of solutions

• In practical applications: only one data set X is available (i.e. one image).– Estimate expected agreement by resampling.

– Cluster entire data set with optimal k.


Joachim M. Buhmann

Measuring disagreement

Two labelings on one data set:

disagreement := fraction of differently labeled objects

Three problems:

1.Clustering solutions are labelings of disjoint sets.

2.Labeling is unique only up to permutation S

k.

3.Fraction of differently labeled points is sensitive to model complexity:

50% @ k = 2 -> totally random,50% @ k = 10 -> often acceptable.


Joachim M. Buhmann

Labeling of Disjoint Sets

Extend solution from set X to X’ by

(i) training a predictor on X

(ii) predicting labels on X’

(iii) compare clusterings on X’

• Choose predictor according to cluster principle.


Joachim M. Buhmann

Predictor Selection

• Nearest neighbor predictor: assign X’i

to cluster (X)i with

• Replacement predictor for cost-based

clustering principles: label X’i from second

data set with the label of the most similar object in the first data set. Similarity is measured in terms of costs.

Optimal predictor will not produce trivial stability (caused by common fluctuations).

X i = argminX 2

d(X 0i ;X )


Joachim M. Buhmann

Breaking Permutation Symmetry

Labeling unique only up to

S k

• Solution: Stability index S := expected minimal disagreement over

all S k.

• Hungarian method O(k3).


Joachim M. Buhmann

Normalize for Different k Values

• Stability costs are scale-sensitive to k.

• Solution: Normalize by maximal stability costs:

• For the random predictor it holds:

Sk(®) · 1¡ 1=k

Sk(½) ! 1¡ 1=k as n ! 1

Ã NormalizeS 7!Sk(®)Sk(½)

.


Joachim M. Buhmann

Complete Stability Measure

• Disagreement rate

• Permutation symmetry breaking

• Expectation w.r.t. two samples from same source

• Normalize by S().

• Estimate EX,X’ by resampling

1S(½)

EX ;X 0

Ã

min¼

1n

nX

i=1

I f®(X 0)i 6= ¼±g(X 0i ;X ;®(X ))g

!


Joachim M. Buhmann

Results on Toy Data


Joachim M. Buhmann

Clustering of Microarray Data

(dataset from Golub et al., Science, Oct. 1999, pp.531-537)

Task: Find groups of different Leukemia tumour samples (two- and three class classifications are

known).Problem: Number of groups is unknown a priori.

Result: 3-means solution

recovers 91% of known sample classifications.

3-means grouping of Golub et al. data set and estimated instability

Via Stability with k-means: Estimated number of groups is 3.


Joachim M. Buhmann





Roadmap


Joachim M. Buhmann

Structure Preserving Embedding

1. MDS: embedding should reproduce the dissimilarities as Euclidean distances

2. Manifold learning: ISOMAP, LLE, Hessian eigen-map, PSOMs, …

3. Structure preserving embedding: embed proximity data such that the cluster structure is preserved


Joachim M. Buhmann

Pairwise Clustering for Segmentation

empiricalfeature distribution

test statistics(2, KS, JS Div.)

Di j

Quality of Segmentation: compactness

k ji

ijDDm

E

EVR,

1pc );(

VE

V

jii,j

imi

,:

)(:


Joachim M. Buhmann

Objects in the

same cluster

Measuring ConnectednessObjects in

different clusters

Effective pairwise dissimilarity is the largest weight on the optimal path between two objects.

Exists a path

with small

dissimilarity

All paths

have at least one

high dissimilarity


Joachim M. Buhmann

Clustering Based on Connectedness

kν

ννpbc (m)DD)(m; VR

Minimize objective function

with mean dissimilarity

ν2

ν,

eff1 (m)(m)DV Vji ijD

Pij(m) : set of all paths from oi to oj within cluster

1

1mP

eff max min(m)ν

hphpp1,...,hp

ij DDij

and effective pairwise dissimilarity


Joachim M. Buhmann

Optimization of PBC

Agglomerative optimization combined with resampling with replacements

Automatic selection of model complexity, i.e., number of segments by fluctuation analysis

Too many evaluationsof objectivefunction

Simulated AnnealingIterated Conditional ModeMultiscale Optimization


Joachim M. Buhmann

Color SegmentationHistogramClustering

Path BasedClustering

ClusterAssignmentProbabilities

InputImage


Joachim M. Buhmann

Images of Corel Databaseh

an

dse

gm

en

tati

on

PB

Cass

ign

men

t p

rob

ab

iliti

es


Joachim M. Buhmann

Comparison on Corel Database

Test on 700 human segmentations of 100 different images of the Corel database

Human/Human Human/Normalized Cut

Mean*: 7 % Mean*: 22 %

*[Martin et al., ICCV ‘01]

Human/Path Based Clustering

Mean: 13,5 %

Human / most stablePath Based Clustering

Mean: 12,4 %


Joachim M. Buhmann

Pairwise Clustering as k-means

Clustering in Feature SpaceDecompose centered distance matrix D as

Dij=Sii + Sjj - 2Sij with arbitrary diagonal elements Sii .

Compute eigenvalues of S: 1 2 ...n

If n0, then S is a Mercer kernel: Sij = i .j

Otherwise update Sij Sij + |n |I . Then n = 0

Transformed distances: D’ij =Dij+ 2 n (1-ij)

Clusterings and their statistics remain invariant!R pc(m;D0) = R pc(m;D) +const:


Joachim M. Buhmann

Example of Path Based Embedding

Input datasetgray value decodes distance from elipse

Embedding ofeffective dissimilarity

gray values kept fixed


Joachim M. Buhmann

1 dimension2 dimensions

PRODOM protein sequences

1500 domain sequences from PRODOM

Eigenvectors of shifted matrix S computed from only 10% of all sequences

10 dimensions20 dimensions


Joachim M. Buhmann

Experiment for Fold Prediction

(373869 unlabeled data)(labeled data,2000 domains)


Joachim M. Buhmann

Solution with 17 Clusters

Approximately 94% correctly identified fold labels


Joachim M. Buhmann

Multidimensional Scaling (visualisation of proximity data)

embeddingLondon

Brighton

OxfordExeter

Aberystwyth

Liverpool

Strathclyde

Glasgow

Edinburgh

Nottingham

Newcastle

Inverness

Travel times between 12 British cities

Ab Br Ed Ex Gl In Li Lo Ne No Ox StAberystwyth Ab 0 300 352 217 336 515 138 251 292 206 202 369Brighton Br 300 0 466 238 451 638 271 88 349 198 122 483Edinburgh Ed 352 466 0 431 47 180 229 401 139 310 378 115Exeter Ex 217 238 431 0 415 595 236 189 371 211 157 448Glasgow Gl 336 451 47 415 0 190 214 386 169 295 362 108Inverness In 515 638 180 595 190 0 393 565 316 474 542 299Liverpool Li 138 271 229 236 214 393 0 206 180 130 183 246London Lo 251 88 401 189 389 565 206 0 284 133 67 418Newcastle Ne 292 349 139 371 169 316 180 284 0 165 268 202Nottingham No 206 198 310 211 295 474 130 133 165 0 117 327Oxford Ox 202 122 378 157 362 542 183 67 268 117 0 394Strathclyde St 369 483 155 448 108 299 246 418 202 327 394 0

MDS

Optimization problem:Approximate travel times by distances in the plane.

nji

ijjiij Dw,

2sstress );( xxDXR


Joachim M. Buhmann

MDS “Dimensionality Stress”

Evolution of ring structure with increasing dimensionality d of the object space. The points are Gaussian distributed in dimension (a) d=5, (b) d=30 and (c) d=100.


Joachim M. Buhmann

Visualization of “Face Space”

Similarity betweenfaces are calculatedbyelastic graph matching.

Embedding by Sammons mapping


Joachim M. Buhmann

Summary & Perspectives

Learning and generalization in vision refers to the general problem of robust optimization!

There exist challenges for unsupervised learning in vision which are conceptually (much) harder than in supervised learning.

Question of statistical vs. computational complexity.


Joachim M. Buhmann

„Neural“ Network Architecture

complete bipartite graph between feature and membership neurons

Lateral inhibition

¼c(i) = 1?

¼c(i) = 2?

weights Stimuli from/to neighboring

sites

Same basic structure for each image site; lateral excitation & inhibition for smoothing

n1ji

nj ji

nf jiw(i)

j ;º ¼logqj jº


Joachim M. Buhmann

Generate spike with prob.

Neural Dynamics in a Site

§

§Spike generato

rs

Basis: Leaky Integrate-and-Fire (I&F) Neuron Model

R

R

Determine firing rate by integration. Winner neuron determines class membership

time

time

time

time

dsk

dt(t) = ¡

sk(t)¿

+X

l2¡ k

wlk(t)yl(t)

| {z }= activation

» nj ji


Joachim M. Buhmann

Segmentation by I&F-NN

Network dynamic

s

Differences

Bayes inference


Joachim M. Buhmann

Dynamics of I&F NN


Joachim M. Buhmann

De-Synchronization of I&F-NN

Average activity pattern of class-membership neurons (k=5)

MSRI Introductory Workshop / Jan. 28, 2005 Joachim M. Buhmann Learning and Image Segmentation Joachim M. Buhmann Institute for Computational Science ETH.

Documents

image segmentation problem

image segmentationjoachim

buhmanncolor segmentation

ncut ncut

trained prototypes

semantics problem

use assignment function

training datacluster