Top Banner
MS Clustering Chapters15_to_17_Part5
14

MS Clustering Chapters15_to_17_Part5. What is it Clustering is the classification of objects into different groups, or more precisely, the partitioning.

Dec 14, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MS Clustering Chapters15_to_17_Part5. What is it  Clustering is the classification of objects into different groups, or more precisely, the partitioning.

MS Clustering

Chapters15_to_17_Part5

Page 2: MS Clustering Chapters15_to_17_Part5. What is it  Clustering is the classification of objects into different groups, or more precisely, the partitioning.

What is it

Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often proximity according to some defined distance measure.

Page 3: MS Clustering Chapters15_to_17_Part5. What is it  Clustering is the classification of objects into different groups, or more precisely, the partitioning.

We have being doing it

We have been grouping people, cars, etc. We are just not very good when we have too many

items to keep track Experts can track five to six dimensions, we may

have data set with many times of that We can only see the obvious groups, most likely It is difficult for us to see the hidden ones, or the

combined ones

Page 4: MS Clustering Chapters15_to_17_Part5. What is it  Clustering is the classification of objects into different groups, or more precisely, the partitioning.

An Example

You can group your customers (for a bike store) into several groups based on • Gender • Income• Age• Etc

There may be other things, such as do they play game?

Page 5: MS Clustering Chapters15_to_17_Part5. What is it  Clustering is the classification of objects into different groups, or more precisely, the partitioning.

Principles of Clustering

Guessing and lying (MS)• Setting clusters

Training with data Calibrating your clusters Training again Repeating until converged or going nowhere

The clustering mythology is very sensitive to the starting points and can converge at local solutions that many not be optimal global solution

Page 6: MS Clustering Chapters15_to_17_Part5. What is it  Clustering is the classification of objects into different groups, or more precisely, the partitioning.

Soft and hard clustering One case one cluster – hard One case several clusters – soft

Page 7: MS Clustering Chapters15_to_17_Part5. What is it  Clustering is the classification of objects into different groups, or more precisely, the partitioning.

Scalable clustering

Ideally, the data point that will not change its cluster do not need to be considered

In MS’ implementation, it will read the first 50,000. If that don’t converge, we process the next 50K, rather than read in and process all 100K.

Page 8: MS Clustering Chapters15_to_17_Part5. What is it  Clustering is the classification of objects into different groups, or more precisely, the partitioning.

Few interesting parameters Clustering_Method

• What method to use 1~4 Clustering_Count

• The number of clusters to find• 0 makes the algorithms to guess a good number

Minimum_Support• What case count can be considered as empty

Stopping_tolerance• The number of cases switch clusters

Sample_size• For scalable clustering

Cluster_Seed• Where to put the clusters

Maximum_Input_attributes• A number before attributed considered before automatic feature selection kicks in. Automatic feature selection,

selects the most popular attributes Maximum_states

• Possible values

Page 9: MS Clustering Chapters15_to_17_Part5. What is it  Clustering is the classification of objects into different groups, or more precisely, the partitioning.

Understanding The Results

Comprehending the results can be difficult because you have to look for many directions• High-level overview• Look into a cluster• Determine how a cluster is different from a near

by one

Page 10: MS Clustering Chapters15_to_17_Part5. What is it  Clustering is the classification of objects into different groups, or more precisely, the partitioning.

High-level overview Cluster Profiles view -- too much info

• Getting some sense regarding who/what are in each cluster

Page 11: MS Clustering Chapters15_to_17_Part5. What is it  Clustering is the classification of objects into different groups, or more precisely, the partitioning.

High-level overview Cluster Diagram view

• Get some sense the relationships among clusters

Page 12: MS Clustering Chapters15_to_17_Part5. What is it  Clustering is the classification of objects into different groups, or more precisely, the partitioning.

Look into a cluster

The Cluster characteristic view• See the attributes that are going together • Note that an attribute ranks high may be

because it is ranked high on all the cluster. In that case, it is not that interesting.

Page 13: MS Clustering Chapters15_to_17_Part5. What is it  Clustering is the classification of objects into different groups, or more precisely, the partitioning.

Cluster characteristic view

Page 14: MS Clustering Chapters15_to_17_Part5. What is it  Clustering is the classification of objects into different groups, or more precisely, the partitioning.

Look outside a cluster Discrimination and Complement

• Shows you what attributes are important