Top Banner
Clusteri ng UNSUPERVISED LEARNING INTRODUCTION Machine Learning
33
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

ClusteringUNSUPERVISED LEARNING INTRODUCTION

Machine Learning

Page 2: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Supervised learning

Training set:

Page 3: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Unsupervised learning

Training set:

Page 4: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Applications of clustering

Organize computing clusters

Social network analysis

Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)

Astronomical data analysis

Market segmentation

Page 5: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

ClusteringVARIANT

Page 6: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Clustering Category• Based on the Clustering Algorithms, clustering are categorized into Four

Major Category:– Partitional (Centroid Based)

Try to cluster data into k number of cluster.Example: K-Means, K-Means++, Fuzzy C-Means.

– Hierarchical• Agglomerative

Start with all data as an individual cluster • Divisive

Start with the entire data as a single cluster.

Page 7: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

– Distribution BasedThe clustering model most closely related to statistics is based on distribution

models.Example: EM-clusteringUnpopular because tend to overfitting

– Density Based• In density-based clustering, clusters are defined as areas of higher density than the

remainder of the data set.

Page 8: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Based on the data• Clustering are categorized into:

– Numerical data clustering– Categorical data clustering

Page 9: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Clustering

K-MEANS ALGORITHM

Page 10: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Page 11: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Page 12: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Page 13: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Page 14: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Page 15: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Page 16: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Page 17: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Page 18: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Page 19: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Input:- (number of clusters)- Training set

(drop convention)

K-means algorithm

Page 20: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Randomly initialize cluster centroids

K-means algorithm

Repeat {for = 1 to

:= index (from 1 to ) of cluster centroid closest to

for = 1 to := average (mean) of points assigned to cluster

}

Page 21: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

K-means for non-separated clusters

T-shirt sizing

Height

Wei

ght

Page 22: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Clustering

OPTIMIZATION OBJECTIVE

Machine Learning

Page 23: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

K-means optimization objective

= index of cluster (1,2,…, ) to which example is currently assigned

= cluster centroid ( )= cluster centroid of cluster to which example has been

assignedOptimization objective:

Page 24: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Randomly initialize cluster centroids

K-means algorithm

Repeat {for = 1 to

:= index (from 1 to ) of cluster centroid closest to

for = 1 to := average (mean) of points assigned to cluster

}

Page 25: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Clustering

RANDOM INITIALIZATION

Machine Learning

Page 26: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Randomly initialize cluster centroids

K-means algorithm

Repeat {for = 1 to

:= index (from 1 to ) of cluster centroid closest to

for = 1 to := average (mean) of points assigned to cluster

}

Page 27: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Random initialization

Should have

Randomly pick training examples.

Set equal to these examples.

Page 28: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Local optima

Page 29: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

For i = 1 to 100 {

Randomly initialize K-means.Run K-means. Get .Compute cost function (distortion)

}

Pick clustering that gave lowest cost

Random initialization

Page 30: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

ClusteringCHOOSING THE NUMBER OF CLUSTERS

Machine Learning

Page 31: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

What is the right value of K?

Page 32: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Choosing the value of K

Elbow method:

1 2 3 4 5 6 7 8

Cost

func

tion

(no. of clusters)1 2 3 4 5 6 7 8

Cost

func

tion

(no. of clusters)

Page 33: Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning.

Andrew Ng

Choosing the value of KSometimes, you’re running K-means to get clusters to use for some later/downstream purpose. Evaluate K-means based on a metric for how well it performs for that later purpose.

E.g. T-shirt sizing

Height

Wei

ght

T-shirt sizing

HeightW

eigh

t