Top Banner
Multivariate Analysis, Clustering, and Classification Jessi Cisewski Yale University Astrostatistics Summer School 2017 1
100

Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

May 28, 2018

Download

Documents

dangmien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Multivariate Analysis,Clustering, and Classification

Jessi CisewskiYale University

Astrostatistics Summer School 2017

1

Page 2: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Multivariate Analysis

Statistical analysis of data containing observations each with > 1variable measured.

Examples:

1 Measurements on a star: luminosity, color, environment,metallicity, number of exoplanets

2 Functions such as light curves and spectra

3 Images

2

Page 3: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Common goals

1 Describe the p-dimensional distribution

Multivariate means, variances, and covariances

Multivariate probability distributions

2 Reduce the number of variables without losing significantinformation

Linear functions of variables (principal components)

3 Investigate dependence between variables

4 Statistical inference

Confidence regions, multivariate regression, hypothesis testing

5 Clustering and Classification

3

Page 4: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Organizing the data

p = number of variables

n = number of observations

xij = i th observation of the j th variable

Variables

Observations

1 2 · · · p1 x11 x12 · · · x1p2 x21 x22 · · · x2p...

......

. . ....

n xn1 xn2 · · · xnp

4

Page 5: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Data matrix:

X =

x11 x12 · · · x1px21 x22 · · · x2p

......

. . ....

xn1 xn2 · · · xnp

This can also be written as n rows or as p columns

X =

xT1xT2...

xTn

=[

x1, x2, · · · , xp]

5

Page 6: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Descriptive statistics

Sample mean of the j th variable

x·j =1

n

n∑i=1

xij

Sample mean vector

x =

x1x2...xp

6

Page 7: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Sample covariance of variables i and j

sij = sji =1

n − 1

n∑k=1

(xki − x·i )(xkj − x·j)

Sample variance of the j th variable is sjj

Sample covariance matrix

S =

s11 s12 · · · s1ps21 s22 · · · s2p...

.... . .

...sp1 sp2 · · · spp

7

Page 8: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Sample correlation coefficient of variables i and j

rij =sij√sii sjj

rii = 1 and rij = rji

Sample correlation matrix

R =

1 r12 · · · r1pr21 1 · · · r2p...

.... . .

...rp1 rp2 · · · 1

8

Page 9: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Caution

Correlations measure the strength of the linear relationshipsbetween variables if such relationships are valid.

9

Page 10: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Multivariate Probability Distributions

Goal: make probability statements about random vectors

p-dimensional random vector:

X =

X1

X2...Xp

where X1, . . . ,Xp are random variables.

X is a continuous random vector if X1, . . . ,Xp are all continuousrandom variables

We’ll focus on continuous random vectors

10

Page 11: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Three important properties of X’s probability density function, f

1 f (x) ≥ 0 for all x ∈ Rp (or wherever the x’s take values)

2 The total area below the function f is 1:∫Rp

f (x)dx = 1

3 For all t1, t2, . . . , tp,

P(X1 ≤ t1,X2 ≤ t2, . . . ,Xp ≤ tp) =

∫ t1

−∞

∫ t2

−∞· · ·∫ tp

−∞f (x)dx

11

Page 12: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Recall: an expected value is an average over the entire population

Mean vector:

µ =

µ1µ2...µp

where

µi = E (Xi ) =

∫Rp

xi f (x)d(x)

is the mean of the ith component of X

12

Page 13: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Covariance between Xi and Xj :

σij = E (Xi − µi )(Xj − µj) = E (XiXj)− µiµj

And the variance of each Xi is:

σii = E (Xi − µi )2 = E (X 2i )− µ2i

Covariance matrix of X:

Σ =

σ11 σ12 · · · σ1pσ21 σ22 · · · σ2p

......

. . ....

σp1 σp2 · · · σpp

Σ = E (X− µ)(X− µ)T = E (XX)T = µµT

Going forward, let’s assume Σ is nonsingular

13

Page 14: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Multivariate Normal Distribution

Consider the following random vector whose possible values rangeover all of ∈Rp:

X =

X1

X2...Xp

X has a multivariate normal distribution if it has a pdf of the form

f (X) =1

(2π)p2 |Σ|

12

exp

[−1

2(X− µ)TΣ−1(X− µ)

]X ∼ Np(µ,Σ)

14

Page 15: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Image: http://en.wikipedia.org

15

Page 16: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Special case: diagonal Σ

Σ =

σ21 0 · · · 00 σ22 · · · 0...

.... . .

...0 0 · · · σ2p

In this case, X1, . . . ,Xp are mutually independent and normallydistributed with density

f (x) =

p∏j=1

{1

(2πσ2j )1/2e− 1

2σ2j

(xj−µj )2}

16

Page 17: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

χ2 Distribution

X ∼ Np(µ,Σ) with pdf

f (X) =1

(2π)p2 |Σ|

12

exp

[−1

2(X− µ)TΣ−1(X− µ)

]

Then (X− µ)TΣ−1(X− µ) ∼ χ2p

17

Page 18: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Tests for multivariate normality

If the data contain a substantial number of outliers then it goesagainst the hypothesis of multivariate normality

If one variable is not normally distributed, then the full set ofvariables does not have a multivariate normal distribution

A possible resolution is to transform the original variables toproduce new variables which are normally distributed

Example: Box-Cox transformations

When datasets arise from a multivariate normal distribution, wecan perform accurate inference on its mean vector and covariancematrix

18

Page 19: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Variables (random vector): X ∼ Np(µ,Σ)

The parameters µ and Σ are unknown

Data (measurements): x1, x2, . . . , xn

Goal: estimate µ and Σ

x is an unbiased estimator of µ (and the MLE of µ)

The MLE of Σ is n−1n S → this is not unbiased (i.e. it is biased)

The sample covariance matrix, S , is an unbiased estimator of Σ

19

Page 20: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Principal Components Analysis (PCA)

Data: X = p-dimensional random vector with covariance matrix Σ

PCA is an unsupervised approach to learning about X

Principal components find directions of variability in X

Can be used for visualization, dimension reduction, regression, etc.

Image: http://cogsci.ucd.ie

20

Page 21: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Statistical LearningLearning from data

1 Unsupervised learning

2 Supervised learning

3 ...

21

Page 22: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

?One? schematic for addressing problems in machine learning...

Image: Andy’s Computer Vision and Machine Learning Blog http://peekaboo-vision.blogspot.com

22

Page 23: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Clustering

Find subtypes or groups that are not defined a priori based onmeasurements

−→ “Unsupervised learning” or “Learning without labels”

Classification

Use a priori group labels in analysis to assign new observations to aparticular group or class

−→ “Supervised learning” or “Learning with labels”

∗ Some content and notation used throughout derived from notes by Rebecca

Nugent (CMU), Ryan Tibshirani (CMU), and textbooks Hastie et al. (2009)

and James et al. (2013).

23

Page 24: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Clustering and Classification

24

Page 25: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Sample Data

“data1”ng <- 50

set.seed(321) #ensures same dataset

g1 <- cbind(rnorm(ng,-3,.1), rnorm(ng,1,1))

g2 <- cbind(rnorm(ng,3,.1), rnorm(ng,1,1))

gtemp <- cbind(rnorm(ng*2,0,.75), rnorm(ng*2,0,.75))

rad1 <- sqrt(gtemp[,1]^2+gtemp[,2]^2)

g3 <- gtemp[order(rad1)[1:ng],]

g4 <- gtemp[order(rad1)[(ng+1):(2*ng)],]

g5.1<-seq(-2.75,2.75, length.out = ng)

g5.2 <- (g5.1/2)^2 - 4

g5 <- cbind(g5.1,g5.2 + rnorm(ng,0,.5))

data1<-rbind(g1,g2,g3,g4,g5)

labels1<-c(rep(1,ng),rep(2,ng),rep(3,ng),rep(4,ng),rep(5,ng))

This dataset will be used to illustrate clustering and classificationmethodologies throughout the lecture.

25

Page 26: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Good references

An Introduction to Statistical Learning (James et al. 2013)

−→ good introduction to get started using methods/not technical

The Elements of Statistical Learning (Hastie et al. 2009)

−→ very thorough and technical coverage of statistical learning

All of Statistics (Wasserman 2004)

−→ great overview of statistics

26

Page 27: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

CLUSTERINGGrouping of similar objects (unsupervised learning)−→ members of the same cluster are “close” in some sense

Pattern recognition

Data segmentation

27

Page 28: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Clustering in Astronomy

→ Use properties of GRBs (e.g. lo-cation in the sky, arrival time, du-ration, fluence, spectral hardness) tofind subtypes/classes of events

Image: Mukherjee et al. (1998)

Image: http://science.hq.nasa.gov

→ Regions with an excess ofgamma rays that correspond to lo-cations of dwarf galaxies could beevidence of particle dark matter

28

Page 29: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Clustering set-up

Notation: given vectors X = {X1,X2, . . . ,Xn} ∈ Rp

−→ n observations in p - dimensional space−→ Variables/features/attributes indexed by j = 1, . . . , p: jthvariable is Xj

−→ Observations indexed by i = 1, . . . , n: ith variable is Xi

Want to learn properties about thejoint distribution P(X) of these vectors:organize, summarize, categorize, explain

No direct measure of success (e.g. no no-tion of a misclassification rate) −→ Suc-cessful if true structure is captured...

Image:

http://20102250.blogspot.com

29

Page 30: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

General goals of clustering

Partition observations such that

1 Observations within a cluster are similar

“Compactness Property”

2 Observations in different clusters are non similar

“Closeness Property”

? Typically want compact clusters that are well-separated ?

30

Page 31: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Dissimilarity Measure

Characterizes degree of “closeness”

Dissimilarity matrix D = {dii ′} such that dii = 0 and

d jii ′ = d(xij , xi ′j)

Some examples of d jii ′ are (xij − xi ′j)

2 or |xij − xi ′j |

Dii ′ = D(Xi ,Xi ′) =∑p

j=1 wj · d jii ′ where

∑pj=1 wj = 1

31

Page 32: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Dissimilarity Measure: within cluster variation

Total cluster variability =1

2

n∑i=1

n∑i ′=1

Dii ′

=1

2

K∑k=1

∑C(i)=k

∑C(i ′)=k

Dii ′ +∑

C(i ′) 6=k

Dii ′

where C (i) = k is the assignment of observation i to cluster k

Total within cluster variability: 12

∑Kk=1

∑C(i)=k

∑C(i ′)=k Dii ′

Total between cluster variability: 12

∑Kk=1

∑C(i)=k

∑C(i ′)6=k Dii ′

32

Page 33: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Clustering methods

1 Combinatorial algorithms1 K - means clustering2 Hierarchical clustering

2 Mixture modeling/Statistical clustering (parametric)

3 Mode seeking/Bump Hunting/Statistical clustering(nonparametric)

33

Page 34: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

K-means Clustering

Image: http://www.holehouse.org/mlclass/13_Clustering.html

34

Page 35: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

K-means clustering

Main idea: partition observations in K separate clusters thatdo not overlap

35

Page 36: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

K-means clustering - procedure

Goal: minimize total within-cluster scatter usingDii ′ =

∑pj=1(xij − xi ′j)

2 = ||Xi − Xi ′ ||2.

Then the within-cluster scatter is written as

1

2

K∑k=1

∑C(i)=k

∑C(i ′)=k

||Xi − Xi ′ ||2 =K∑

k=1

|Ck |∑

C(i)=k

||Xi − Xk ||2

|Ck | = number of observations in cluster Ck

Xk = (X k1 , . . . , X

kp )

36

Page 37: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

K-means clustering - recipe

Pick K (number of clusters)

Select K centers

Alternate between the following:1 Assign each observation to closest center2 Recalculate centers

R: kmeans(data, K, nstart = 20)

37

Page 38: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Simple code: K-means

set.seed(321)

g1 <- cbind(rnorm(50,0,1), rnorm(50,-1,.1))

g2 <- cbind(rnorm(50,0,1), rnorm(50,1,.1))

g3 <- cbind(rnorm(50,0,1), rnorm(50,3,.1))

data.points <- rbind(g1, g2, g3)

km.out <- kmeans(data.points,3, nstart = 20)

plot(data.points, pch = 16, xlab = "X1", ylab = "X2", col = km.out$cluster)

points(km.out$centers[,1], km.out$centers[,2],pch=4,lwd=4, col="blue", cex = 2)

−→ Try moving the clusters closer together (in terms of X2) and seewhat happens with the K-means clusters.

38

Page 39: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

K-means clustering - determining K

Choose the k that has the last “significant” reduction in the withingroups sum of squares (i.e. find the “elbow”)

set.seed(123); wss <- c(); number.k <- c(1:15)

for(ii in number.k){km0 <- kmeans(data1,ii,nstart = 20)

wss[ii] <- km0$tot.withinss}

plot(number.k, wss, xlab="K",ylab="WSS", type = "b", pch = 19)

39

Page 40: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

K-means clustering - tips

Can be unstable; solution depends on starting set of centers−→ finds local optima, but want global optima−→ start with different centers kmeans(x, K, nstart = 20)

Cluster assignments are strict −→ no notion of degree or strengthof cluster membership

Possible lack of interpretability of centers−→ centers are averages:

- OK for clustering things like apartment prices based on price,square footage, distance to nearest grocery store

- But what if observations are images of faces?

Images: http://cdn1.thefamouspeople.com,http://www.notablebiographies.com,http:

//mrnussbaum.com,http://3.bp.blogspot.com 40

Page 41: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

K-means clustering - tips, continued.

Influenced by outliers

−→ use medoids - the observation in the data set whose averagedistance to all other observations is minimal

−→ medoids is more computationally intensive

−→ centers are actual observations leading to betterinterpretability

R: pam(data, K) (‘‘partitioning around medoids’’) in cluster package

41

Page 42: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Hierarchical Clustering

Image: http://en.wikipedia.org/wiki/Hierarchical_clustering

42

Page 43: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Hierarchical vs. Flat Partitions

1 Flat partitioning (e.g. K-means clustering)

partitions data into K clusters; K determined by userno sense of the relationships among the clusters

2 Hierarchical partitioning

Generates a hierarchy of partitions; user selects the partitionP1 = 1 cluster, . . ., Pn = n clusters (agglomerative clustering)Partition Pi is the union of one or more clusters from PartitionPi+1

43

Page 44: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Hierarchical clustering - recipe

Define a dissimilarity dkk′ = d(Ck ,Ck′) between clusters Ck

and Ck′ as a function of a distance between points in the

clusters

1 Start with every observation in its own cluster

2 Find min d(Ck ,Ck′) −→ merge Ck and Ck′

(Minimum is across all pairs of clusters)

3 Repeat until only one cluster remains

R: hclust(distances) 44

Page 45: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Hierarchical clustering - distances

1 Friends-of-friends = Single-linkage clustering: intergroupdistance is smallest possible distance

d(Ck ,Ck ′) = minx∈Ck ,y∈Ck′

d(x , y)

2 Complete-linkage clustering: intergroup distance is largestpossible distance

d(Ck ,Ck ′) = maxx∈Ck ,y∈Ck′

d(x , y)

3 Average-linkage clustering: average intergroup distance

d(Ck ,Ck ′) = Avex∈Ck ,y∈Ck′d(x , y)

4 Ward’s clustering

d(Ck ,Ck ′) =2 (|Ck | · |Ck ′ |)|Ck |+ |Ck ′ |

||XCk− XCk′ ||

2

45

Page 46: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Single-linkage clustering

46

Page 47: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

data1

47

Page 48: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

data1 (K = 4 )

47

Page 49: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Clustering Recap

1 Algorithmic clustering (no statistical assumptions)1 K-means2 Hierarchical linkage

2 Statistical clustering1 Parametric - associates a specific model with the density (e.g.

Gaussian, Poisson)−→ parameters associated with each cluster

2 Nonparametric - looks at contours of the density to find clusterinformation (e.g. kernel density estimate)

48

Page 50: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Mixture Modeling/Parametric Statistical Clustering

Image: Li and Henning (2011)

49

Page 51: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Model-based clustering (parametric)

Assumes that each subgroup/cluster/component in the populationhas its own density

p(X ) =K∑

k=1

πkpk(X ; θk)

where∑K

k=1 πk = 1 and 0 ≤ πk ≤ 1.

−→ Entire dataset is modeled by a mixture of these distributions.

p(x) = 0.5φ(x ; 4; 1) + 0.5φ(x ; 0; 1) (middle plot)p(x) = 0.75φ(x ; 4; 1) + 0.25φ(x ; 0; 1) (right plot)

50

Page 52: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Model-based clustering: Gaussian example

Suppose there are K clusters - each cluster is modeled by a particular

distribution (e.g. a Gaussian distribution with parameters µk , Σk)

The density of each cluster k is

pk(X ) = φ(X | µk ,Σk)

=1√

(2π)p|Σk |exp

(−

(X − µk)TΣ−1k (X − µk)

2

)

Letting πk be the weight of cluster k , the mixture density is

p(X ) =K∑

k=1

πkpk(X ) =K∑

k=1

πkφ(X | µk ,Σk)

51

Page 53: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Model-based clustering: advantages

1 Well-studied statistical inference techniques available

2 Flexibility in choosing the component distributions

3 Obtain a density estimate for each cluster

4 Soft classification is available

52

Page 54: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Model-based clustering: fitting the model

Suppose there are K clusters - each cluster is modeled by a particular

distribution (e.g. a Gaussian distribution with parameters µk , Σk)

Expectation - Maximization (EM) algorithm

Finds maximum likelihood estimates in incomplete data (e.g.missing cluster labels).

Alternates between expectation step and maximization steps:

E-step: compute conditional expectation of the cluster labels

M-step: maximize the likelihood and estimate parametersgiven the current labels; update parameter estimates

53

Page 55: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Model-based clustering: EM algorithm

Observations {X1,X2, . . . ,Xn} are incomplete (i.e. no labels)

Complete observations: {(X1,Y1), (X2,Y2), . . . , (Xn,Yn)}(where Yi are the labels)

The collections of parameters, θ are (πk , µk ,Σk) fork = 1, . . . ,K

The log-likelihood function is

l(X1, . . . ,Xn | θ) =n∑

i=1

log

(K∑

k=1

πkφ(xi | µk ,Σk)

)

l(X1, . . . ,Xn | θ) is the objective function of the EMalgorithm. Numerical difficulty comes from the sum inside thelog.

Goal: get MLEs for (πk , µk ,Σk), k = 1, . . . ,K

54

Page 56: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Nonparametric Clustering

Image: Feigelson et al. (2011) 55

Page 57: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Nonparametric clustering

Associate groups with high frequency areas −→ groups in thepopulation correspond to modes of the density p(x)

Goal: find the modes of the density p(x), or p(x). Assignobservations to the “domain of attraction” of a mode.

56

Page 58: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Nonparametric clustering

Associate groups with high frequency areas −→ groups in thepopulation correspond to modes of the density p(x)

Goal: find the modes of the density p(x), or p(x). Assignobservations to the “domain of attraction” of a mode.

56

Page 59: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Nonparametric clustering

Associate groups with high frequency areas −→ groups in thepopulation correspond to modes of the density p(x)Goal: find the modes of the density p(x), or p(x). Assignobservations to the “domain of attraction” of a mode.

?NP clustering is very dependent on the density estimate p(x)56

Page 60: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Final comment on selecting K

Slide from Ryan Tibshirani’s lecture, which was from George Cassella’s CMU seminar on 1/16/2011. 57

Page 61: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Final comment on selecting K

Slide from Ryan Tibshirani’s lecture, which was from George Cassella’s CMU seminar on 1/16/2011.57

Page 62: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Concluding Remarks: clustering

Clustering - unsupervised/no labels → find structure1 K - means2 Agglomerative hierarchical clustering3 Parametric/Nonparametric

58

Page 63: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

CLASSIFICATIONBuild a model, classifier, etc. to separate data into knowngroups/classes (supervised learning)

Response variable is not continuous −→ want to predict labels

1 Bayes classifiers

2 K Nearest Neighbors (KNN) classifiers

3 Logistic regression

4 Linear Discriminant Analysis (LDA)

5 Support Vector Machines (SVM)

6 Tree classifiers

59

Page 64: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Classification in Astronomy

Stars can be classified into OBFGKMLTY

Image: https://writescience.files.wordpress.com

Classification of Active galactic nuclei (e.g. Starburst, Seyfert (I or II),Quasars, Blazars, BL Lac, OVV, Radio)

Galaxies can be classified into Hubble mor-phological types (E/S0/S/Irr), clustering en-vironments, and by star formation activity

Image: http://en.wikipedia.org

60

Page 65: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Classification set-up

Notation: given vectors X = {X1,X2, . . . ,Xn} ∈ Rp and classlabels Y = {y1, y2, . . . , yn}−→ the yi ’s are qualitative−→ let yi be the predicted class label for observation i−→ main interest is P(Y | X)

The classification training error rate is often estimated using atraining dataset as

1

n

n∑i=1

I(yi 6= yi )

where I(·) is the indicator function.

The classification test error rate is often estimated using a testdataset, (xtest , ytest) as

E (I(ytest 6= ytest))

−→ good classifiers have small test errors

61

Page 66: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Bayes Classifiers

Test error is minimized by assigning observations with predictors xto the class that has the largest probability:

P(Y = j | X = x)

for classes j = 1, . . . , J

If there are two classes (J = 2), the Bayes decision rule givenpredictors x is

yi =

{class 1 if P(Y = 1 | X = x) > 0.50class 2 if P(Y = 2 | X = x) > 0.50

?In general, intractable because the distribution of Y | X isunknown.

62

Page 67: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Image: http://vlm1.uta.edu/~athitsos/nearest_neighbors/

63

Page 68: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

K Nearest Neighbors (KNN)

Main idea: An observation is classified based on the Kobservations in the training set that are nearest to it

A probability of each class can be estimated by

P(Y = j | X = x) = K−1∑

i∈N(x)

I(yi = j)

where j = 1, . . ., #classes in training set, and I = indicatorfunction.

K = 3 nearest neighbors to the Xare within the circle.

The predicted class of X would beblue because there are more blueobservations than green amongthe 3 NN.

64

Page 69: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

KNN: data1

R: knn(training.set, test.set, training.set.labels, K) in class

package 65

Page 70: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

KNN: data1 decision boundary

Using all observations of data1, K = 1

66

Page 71: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Linear Classifiers−→ Decision boundary is linear

Logistic regression

Linear Discriminant Analysis

Image: http://fouryears.eu/2009/02/

67

Page 72: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Logistic Regression

Predicting two groups: binary labels

Y =

{1 with probability π0 with probability 1− π

E [Y ] = P(Y = 1) = π

We assume the following model for logistic regression

logit(π) = log

1− π

)= β0 + βTX

where β0 ∈ R, β ∈ Rp

68

Page 73: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Logistic Regression, continued.

logit(π) = log

1− π

)= β0 + βTX

=⇒ π =eβ0+β

TX

1 + eβ0+βTX

Can fit β0, . . . , βp via MLE

l(β0, β) =∑n

i=1 log (P(Y = yi | X = xi )) =⇒

(β0, β) = argmaxβ0∈R,β∈Rp

n∑i=1

[yi · (β0 + βT xi )− log(1 + eβ0+β

T xi )]

69

Page 74: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Logistic Regression for data1

logit.labels = matrix(0, nrow = length(labels1))

logit.labels[c(which(labels1 == 2), which(labels1 == 5))] <- 1

logit.fit <- glm(logit.labels ~ data1[,1]+data1[,2],family = binomial)

summary(logit.fit) #<---provides details of the fit model

logit.fit.probs <- predict(logit.fit, type = "response")

logit.class = matrix(0,nrow = length(logit.fit.probs))

logit.class[logit.fit.probs >=.5] = 1

par(mfrow = c(1,2))

plot(data1, xlim = c(-4,4), ylim = c(-5,4), xlab = "X1", ylab = "X2", col = logit.labels+1,

pch = logit.labels+1, lwd = 3, main = "True classes")

plot(data1, xlim = c(-4,4), ylim = c(-5,4), xlab = "X1", ylab = "X2", col = logit.class+1,

pch = logit.class+1, lwd = 3, main = "Predicted classes")

70

Page 75: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Logistic Regression for data1: fit model

logit(π?) = β0 + β1 · x1? + β2 · x2? = −1.94 + 1.56 · x1? − 1.50 · x2?

71

Page 76: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Multiple classes

What do we do if our response is not binary (i.e. J > 2)?

There are extensions of logistic regression for multiple classes:Multiple - class logistic regression

Linear Discriminant Analysis (LDA) is another option

72

Page 77: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Discriminant Analysis

Goal: estimate a decision boundary that gives a classification rule

Basic idea: estimate the posterior probabilities of class membership

−→ If an observation is in a particular location, what is theprobability it belongs to a particular class

Bayes’ Rule

P(Y = j ′ | x) =πj ′ · pj ′(x)∑Jj=1 πj · pj(x)

πj = prior probabilities of class j

pj(x) = P(X = x | Y = j)

−→ Need a way to estimate pj(x) in order to do classification

73

Page 78: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Example using Bayes’ Rule

Consider X = 1.9, P(Y = 1) =

12 · 0.065

12 · 0.065 + 1

2 · 0.219≈ 0.229

23 · 0.065

23 · 0.065 + 1

3 · 0.172≈ 0.274

74

Page 79: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Linear Discriminant Analysis (LDA)

Multivariate Gaussian

pj(X ) = φ(X | µj ,Σj)

=1√

(2π)p|Σj |exp

(−

(X − µj)TΣ−1j (X − µj)2

)

LDA assumes all covariance matrices are equal (Σ1 = · · · = ΣJ)

75

Page 80: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

LDA continued

A predictor x is classified in the class j = 1, . . . , J according to thefollowing:

argmaxj{P(Y = j | X = x)} =

= argmaxj{φ(x | µj ,Σ)πj}

= argmaxj{log (φ(x | µj ,Σ)πj)}

...

= argmaxj

{−1

2(x − µj)TΣ−1(x − µj) + log(πj)

}= argmax

j

{xTΣ−1µj −

1

2µTj Σ−1µj + log(πj)

}

76

Page 81: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

LDA for data1

R: lda(y ∼ x)

77

Page 82: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Quadratic Discriminant Analysis (QDA)

Assuming a common covariance matrix, Σ, is not alwaysreasonable.Allowing for different covariance matrices, Σj , a predictor x isclassified in the class j = 1, . . . , J according to the following:

argmaxj{P(Y = j | X = x)} =

= argmaxj{φ(x | µj ,Σj)πj}

= argmaxj{log (φ(x | µj ,Σj)πj)}

...

= argmaxj

{−1

2(x − µj)TΣ−1j (x − µj) + log(πj)

}= argmax

j

{−1

2xTΣ−1j x + xTΣ−1j µj −

1

2µTj Σ−1j µj + log(πj)

}78

Page 83: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

QDA for data1

R: qda(y ∼ x)

79

Page 84: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Support Vector Machines

80

Page 85: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Support Vector Machines (SVM)

Goal: Find the hyperplane that“best” separates the two classes(i.e. maximize the margin be-tween the classes)

Data: {xi , yi}, i = 1, . . . , n, andyi = {−1, 1}

Image: http://en.wikipedia.org

A separating hyperplane does not always exist

−→ incorporate a parameter to penalize misclassifications

81

Page 86: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

(Xi ·w + b)yi ≥ 1− ξi , ξi ≥ 0 for i = 1, · · · , n

minimize(1

2||w||2 + C

n∑i=1

ξi

)

ξi is capturing the degreeto which observation i ismisclassified

C is a misclassificationpenalty

Image: http://en.wikipedia.org/wiki/Support_vector_machine

82

Page 87: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

SVM “kernel trick”

Image: http://ccforum.com/content/11/4/r83/figure/f1

83

Page 88: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

SVM for data1 (linear) - A

R: svm(x,y,kernel = ‘‘linear’’, class.weights) in e1071 package

84

Page 89: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

SVM for data1: fit model (linear) - A

85

Page 90: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

SVM for data1 (linear) - B

R: svm(x,y,kernel = ‘‘linear’’, class.weights) in e1071 package

86

Page 91: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

SVM for data1: fit model (linear) - B

87

Page 92: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

SVM for data1 (radial basis)

R: svm(x,y,kernel = ‘‘radial’’, class.weights) in e1071 package

88

Page 93: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

SVM for data1: fit model (radial basis)

89

Page 94: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Classification Trees

Image: http://astronomy.swin.edu.au Image: http://dame.dsf.unina.it/dame_td.html

90

Page 95: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Classification trees

Goal: determine which variables are “best” at separatingobservations into the labeled groups

1 Predictor space is partitioned into hyper-rectangles

2 Any observations in the hyper-rectangle would be predicted tohave the same label

3 Next split is chosen to maximize “purity” of hyper-rectangles

Tree-based methods are not typically the best classificationmethods based on prediction accuracy, but they are often moreeasily interpreted (James et al. 2013)

CART = “Classification and Regression Trees”

91

Page 96: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Classification Tree for data1

92

Page 97: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Classification Trees - recipe

Recipe

Start at the “top” of the tree and use recursive binary splitting togrow the tree

At each split, determine the classification error rate for the newregion (i.e. the number of observations that are not of the majorityclass in their region)−→ The Gini Index or cross-entropy are better for node purity.

R: tree(y ∼ x) in tree package.

93

Page 98: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Classification Trees - remarks

Tree pruning - the classification tree may be over fit, or toocomplex; pruning removes portions of the tree that are not usefulfor the classification goals of the tree.

Bootstrap aggregation (aka “bagging”) - there is a high variancein classification trees, and bagging (averaging over many trees)provides a means for variance reduction. (Boosting is anotherapproach, but grows trees sequentially rather using a bootstrappedsample.)

Random forest - similar idea to bagging except it incorporates astep that helps to decorrelate the trees.

94

Page 99: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Concluding Remarks

Clustering - unsupervised/no labels → find structure1 K - means/ K - medoids2 Agglomerative hierarchical clustering3 Parametric/Non-parametric

Classification - supervised/labels → predict classes1 KNN2 Logistic regression3 LDA/QDA4 Support Vector Machines5 Tree classifiers

Clustering and classification are useful tools, but need to befamiliar with assumptions that go into the methods

95

Page 100: Multivariate Analysis, Clustering, and Classificationastrostatistics.psu.edu/su17/17Lectures/stat_learning2017.pdf · Multivariate Analysis, Clustering, and Classi cation ... multivariate

Bibliography

Feigelson, E. D., Getman, K. V., Townsley, L. K., Broos, P. S., Povich, M. S.,Garmire, G. P., King, R. R., Montmerle, T., Preibisch, T., Smith, N., et al.(2011), “X-ray Star Clusters in the Carina Complex,” The AstrophysicalJournal Supplement Series, 194, 9.

Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Friedman, J., andTibshirani, R. (2009), The elements of statistical learning, vol. 2, Springer.

James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013), An Introductionto Statistical Learning with Applications in R, vol. 1 of Springer Texts inStatistics, Springer.

Li, H.-b. and Henning, T. (2011), “The alignment of molecular cloud magneticfields with the spiral arms in M33,” Nature, 479, 499–501.

Mukherjee, S., Feigelson, E. D., Babu, G. J., Murtagh, F., Fraley, C., andRaftery, A. (1998), “Three types of gamma-ray bursts,” The AstrophysicalJournal, 508, 314.

Wasserman, L. (2004), All of statistics: a concise course in statistical inference,Springer.

96