Stability of Cluster Analysis STATISTIK AUSTRIA Die Infor mationsmanager Matthias Templ & Peter Filzmoser Vienna University of Technology Vienna, June 16, 2006 Stability of Cluster Analysis For real data sets without obvious grouping structure the stability of clusters depends on: 1. Input data - the selection of variables 2. Preparation of the data 3. Distance measure used * 4. Clustering method 5. Number of clusters Changing one parameter may result in complete different cluster results. * if a distance measure must be chosen 1. Input Data - Variable Selection > library(mvoutlier) > library(cluster) > data(humus) > a <- agnes(t(prepare(humus[, -c(1:3)]))) > plot(a, which.plots = 2, col = c(4), col.main = 3, col.sub = 2) 1. Input Data - Variable Selection > library(mvoutlier) > data(humus) > a <- agnes(t(prepare(humus[, -c(1:3)]))) > plot(a, which.plots = 2, col = c(4), col.main = 3, col.sub = 2) Ag Bi Pb Rb Tl Ba Si K Mn Zn Al Be La Y Th U Cr V Fe Sc As Co Cu Ni Mo Cd P B Mg Na pH Ca Sr Hg S N C H LOI Sb Cond 5 10 15 20 25 30 35 Dendrogram of agnes(x = t(prepare(humus[, -c(1:3)]))) Agglomerative Coefficient = 0.47 t(prepare(humus[, -c(1:3)])) Height A chemical process can be seen in more detail in a map (later) by choosing similar variables.
4
Embed
Stability of Cluster Analysis - R: The R Project for Statistical …Filzmoser.pdf · Stability of Cluster Analysis ST A TISTIK AUSTRIA Die Infor mationsmanager Matthias Templ & Peter
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Stability of Cluster Analysis
S T A T I S T I K A U S T R I A
D i e I n f o r m a t i o n s m a n a g e r
Matthias Templ & Peter FilzmoserVienna University of Technology
Vienna, June 16, 2006
Stability of Cluster Analysis
For real data sets without obvious grouping structure the stability of clusters depends on:
1. Input data - the selection of variables
2. Preparation of the data
3. Distance measure used ∗
4. Clustering method
5. Number of clusters
Changing one parameter may result in complete different cluster results.
∗if a distance measure must be chosen
1. Input Data - Variable Selection
> library(mvoutlier)
> library(cluster)
> data(humus)
> a <- agnes(t(prepare(humus[, -c(1:3)])))> plot(a, which.plots = 2, col = c(4), col.main = 3, col.sub = 2)
1. Input Data - Variable Selection
> library(mvoutlier)
> data(humus)
> a <- agnes(t(prepare(humus[, -c(1:3)])))> plot(a, which.plots = 2, col = c(4), col.main = 3, col.sub = 2)
Ag
Bi
Pb
Rb Tl
Ba Si
KM
n ZnA
lB
eLa Y
Th UC
r VFe S
cA
sC
oC
u Ni
Mo
Cd
PB
Mg
Na
pHC
a Sr
Hg
S NC H
LOI
Sb
Con
d
510
1520
2530
35
Dendrogram of agnes(x = t(prepare(humus[, −c(1:3)])))