Top Banner
12. Clustering Foundations of Machine Learning CentraleSupélec — Fall 2016 Benoît Playe, Chloé-Agathe Azencott Centre for Computational Biology, Mines ParisTech [email protected]
60

12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

May 12, 2018

Download

Documents

lamnhu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

12. Clustering

Foundations of Machine LearningCentraleSupélec — Fall 2016

Benoît Playe, Chloé-Agathe AzencottCentre for Computational Biology, Mines

ParisTechchloe­agathe.azencott@mines­paristech.fr

Page 2: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

2

Learning objectives● Explain what clustering algorithms can

be used for.● Explain and implement three different

ways to evaluate clustering algorithms.

● Implement hierarchical clustering, discuss its various flavors.

● Implement k-means clustering, discuss its advantages and drawbacks.

Page 3: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

3

Goals of clustering

Group objects that are similar into clusters: classes that are unknown beforehand.

Page 4: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

4

Goals of clustering

Group objects that are similar into clusters: classes that are unknown beforehand.

Page 5: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

5

Goals of clustering

Group objects that are similar into clusters: classes that are unknown beforehand.

E.g. – group genes that are similarly affected by a

disease

– group people who share the same interest (marketing purposes)

– group pixels in an image that belong to the same object (image segmentation).

Page 6: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

6

Applications of clustering● Understand general characteristics of the data● Visualize the data● Infer some properties of a data point based on

how it relates to other data points

E.g.– find subtypes of diseases– visualize protein families– find categories among images– find patterns in financial transactions– detect communities in social networks

Page 7: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

7

Centroids and medoids● Centroid: mean of the points in the cluster.

● Medoid: point in the cluster that is closest to the centroid.

Page 8: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

8

Distances and similarities

Page 9: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

9

For a given mapping

from the space of objects X to some Hilbert space H, the kernel between two objects x and x' is the inner product of their images in the feature spaces.

Similarities● Kernels define similarities

Page 10: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

10

Distances● Assess how close / far

– data points are from each other

– a data point is from a cluster

– two clusters are from each other

Page 11: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

11

Distances● Assess how close / far

– data points are from each other

– a data point is from a cluster

– two clusters are from each other

● Distance metric

symmetry

triangle inequality

Page 12: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

12

Distance & similarities● Transform distances into similarities?

Page 13: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

13

Distance & similarities● Transform distances into similarities?

● Transform similarities into distances?

Generalization:

Page 14: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

14

Distances● L-q norm:

● Pearson's correlation

Measure of the linear correlation between two variables

Page 15: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

15

Pearson's correlationWhat does it correspond to if the features are centered?

Page 16: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

16

Pearson's correlationWhat does it correspond to if the features are centered?

Normalized dot product = cosine

featu

re 2

feature 1

● What's the max value of ρ? What does it mean?

● What's the min value of ρ? What does it mean?

Page 17: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

17

Pearson's correlationWhat does it correspond to if the features are centered?

Normalized dot product = cosine

featu

re 2

feature 1

● Max value of ρ: 1

There's a positive linear relationship between the features of x and the features of z.

● Min value of ρ: -1

There's a negative linear relationship between the features of x and the features of z.

Page 18: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

18

Pearson vs Euclide● Pearson's coefficient

Profiles of similar shapes will be close to each other, even if they differ in magnitude.

● Euclidean distance

Magnitude is taken into account.

Page 19: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

19

Pearson vs Euclide

Page 20: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

20

Evaluating clusters

Page 21: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

21

Evaluating clusters● Clustering is unsupervised. There is no

ground truth.● How do we evaluate the quality of a

clustering algorithm?

Page 22: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

22

Evaluating clusters● Clustering is unsupervised. There is no

ground truth.● How do we evaluate the quality of a

clustering algorithm?1)Based on the shape of the clusters:

Points within the same cluster should be nearby/similar and points far from each other should belong to different clusters.

2)Based on the stability of the clusters:

We should get the same results if we remove some data points, add noise, etc.

3)Based on domain knowledge:

The clusters should “make sense”.

Page 23: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

23

Clusters shape● Cluster tightness (homogeneity)

● Cluster separation

● Davies-Bouldin index

Tk

Skl

Page 24: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

24

Clusters shape● Silhouette coefficient

a(x) = average dissimilarity of x with the other points in the same cluster

b(x) = lowest average dissimilarity of x with any other point

– If x is very close to other points in the same cluster, and very different from points in other cluster, s(x) ≈ … ?

– If x is assigned to the wrong cluster, s(x) ≈ ... ?

Page 25: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

25

Clusters shape● Silhouette coefficient

a(x) = average dissimilarity of x with the other points in the same cluster

b(x) = lowest average dissimilarity of x with any other point

– If x is very close to other points in the same cluster, and very different from points in other cluster, s(x) ≈ +1.

– If x is assigned to the wrong cluster, s(x) ≈ -1.

Page 26: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

26

Cluster stability● How many clusters?

Page 27: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

27

Cluster stability

● k=4

● K=2 or 5. What clusters do you expect?

Page 28: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

28

Cluster stability

● k=4

● k=2 ● k=5

● Several algorithms / runs of the same algorithm might give you different answers

Page 29: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

29

Cluster stability

● k=4

● k=2 ● k=5

● Several algorithms / runs of the same algorithm might give you different answers

Page 30: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

30

Cluster stability

● Measuring cluster stability● Generate perturbed versions of the original dataset

(for example by sub-sampling or adding noise).● Cluster the data set with the desired algorithm into k

cluster.● Instability measure:

● One can choose the number of cluster which minimizes the instability measure

Page 31: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

31

Domain knowledge● Do the cluster match natural

categories?– Check with human expertise

Page 32: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

32

Domain knowledge: enrichment analysis

● Example: Ontology

Entities may be grouped, related within a hierarchy, and subdivided according to similarities and differences.

Build by human experts

● E.g.: The Gene Ontologyhttp://geneontology.org/

– Describe genes with a common vocabulary, organized in categories

E.g. cellular process > cell death > programmed cell death > apoptotic process > execution phase of apoptosis

Page 33: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

33

Ontology enrichment analysis

● Enrichment analysis:

Are there more data points from ontology category G in cluster C than expected by chance?

● TANGO [Tanay et al., 2003]

– Assume data points sampled from a hypergeometric distribution

Page 34: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

34

Ontology enrichment analysis

● Enrichment analysis:

Are there more data points from ontology category G in cluster C than expected by chance?

● TANGO [Tanay et al., 2003]

– Assume data points sampled from a hypergeometric distribution

– The probability for the intersection of G and C to contain more than t points is:

Probability of getting i points from G when drawing |C| points from a total of n samples.

Page 35: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

35

Hierarchical clustering

Page 36: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

36

Hierachical clustering

Group data over a variety of possible scales, in a multi-level hierarchy.

Page 37: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

37

Construction● Agglomerative approach (bottom-up)

Start with each element in its own cluster

Iteratively join neighboring clusters.

● Divisive approach (top-down)

Start with all elements in the same cluster

Iteratively separate into smaller clusters.

Page 38: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

38

Dendogram● The results of a hierarchical clustering

algorithm are presented in a dendogram.● Branch length = cluster distance.

How many clusters do I have?

Page 39: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

39

Dendogram● The results of a hierarchical clustering

algorithm are presented in a dendogram.● Branch length = cluster distance.

How many clusters do I have?

21 3 4

Page 40: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

40

Linkage

How do we decide how to connect/split two clusters?● Single linkage

● Complete linkage

Page 41: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

41

LinkageHow do we decide how to connect/split two clusters?

● Average linkage or UPGMA

Unweighted Paired Group Method with Arithmetic mean

● Centroid linkage or UPGMC– Unweighted Paired Group Method using

Centroids

Page 42: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

42

Draw single linkage, complete linkage, UPGMC.

Page 43: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

43

Draw single linkage, complete linkage, UPGMC.

Page 44: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

44

Example: Gene expression clustering

Breast cancer survival signature

[Bergamashi et al. 2011]

gen

es

patients1 2

2

1

Page 45: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

45

LinkageHow do we decide how to connect/split two clusters?

● Ward

Join clusters so as to minimize within-cluster variance

Page 46: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

46

Hierarchical clustering● Advantages

– No need to pre-define the number of clusters

– Interpretability

● Drawbacks– Computational complexity

What is the computational complexity of hierarchical clustering?

Page 47: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

47

Hierarchical clustering● Advantages

– No need to pre-define the number of clusters– Interpretability

● Drawbacks– Computational complexity:

E.g. Single/complete linkage (naive):

At least O(pn²) to compute all pairwise distances.

– Must decide at which level of the hierarchy to split

– Lack of robustness (unstable)

Page 48: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

48

K-means

Page 49: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

49

K-means clustering● Minimize the intra-cluster variance

● What will this partition of the space look like?

Page 50: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

50

K-means clustering● Minimize the intra-cluster variance

● Voronoi tessellation

Page 51: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

51

Lloyd's algorithm● K-means cannot be easily optimized● We adopt a greedy strategy.

– Partition the data into K clusters at random

– Compute the centroid of each cluster

– Assign each point to the cluster whose centroid it is closest to

– Repeat until cluster membership converges.

Page 52: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

52

demo

Page 53: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

53

K-means● Advantages

– What is the computational time of k-means?

Page 54: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

54

K-means● Advantages

– Computational time:

– Easily implementable

● Drawbacks– Need to set up K ahead of time

– What happens when there are outliers?

number of iterations

compute kn distancesin p dimensions (Can be small if there's

indeed a cluster structure in the data)

Page 55: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

55

K-means● Advantages

– Computational time is linear – Easily implementable

● Drawbacks– Need to set up K ahead of time– Sensitive to noise and outliers– Stochastic (different solutions with each

iteration)– The clusters are forced to have spherical

shapes

Page 56: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

56

K-means variants● K-means++

– Seeding algorithm to initialize clusters with centroids “spread-out” throughout the data.

– Deterministic● K-medoids● Kernel k-means

Find clusters in feature space

k-means kernel k-means

Page 57: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

57

More clustering approaches● Soft clustering

Each point gets a probability of belonging to each cluster.

● Disjunctive clustering

Each point can belong to multiple clusters.

● Density-based clustering

Look for dense regions of the space.

E.g. DBSCAN

Page 58: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

58

Summary● Clustering: unsupervised approach to group

similar data points together.● Evaluate clustering algorithms based on

– the shape of the cluster– the stability of the results– the consistency with domain knowledge.

● Hierarchical clustering– top-down / bottom-up– various linkage functions.

● k-means clustering.

Page 59: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

59

Exam: Fri, Dec 16 8am‒11am

● No documents, no calculators, no computer.● Theoretical, technical, and practical questions ● Exam from last year online ● Short answers!● How to study

– Homework + last year's exam

– Labs

– Answer the questions on the slides.

● Formulas– To know: Bayes, how to compute derivatives.

– Everything else will be given. Interpretation is key.

Page 60: 12. Clustering - Chloé-Agathe Azencottcazencott.info/dotclear/public/lectures/ma2823_2016/...– group pixels in an image that belong to the same object (image segmentation). 6 Applications

60

challenge project

● Detailed instructions on the course website http://cazencott.info/dotclear/public/lectures/ma2823_2016/kaggle-project.pdf

● Deadline for submissions & for the report:

Fri, Dec 16 at 23:59● Report: Practical instructions:– PDF document

● No more than 2 pages● Lastname1_Lastname2.pdf

– Starts with● Full names● Kaggle user names● Kaggle team names.

By email to all three of us!