Top Banner
Data Mining 2013 Section 8 Classification Supervised Learning Classification is a form of supervised learning - where we have prior information on what the result (or class/group) should look like. There are many situations in which classification or discrimination of data is required. For example, when you apply for a loan, the lender will look at your credit, employment, and residence history in order to detemine if you qualify for a loan; if you are suspected of having a particular disease the doctor will take your temperature, blood pressure, check various other things - probably have some tests done - and then, based on all the data collected, determine the presence or absence of the disease. In these (and many other situations) the decision is made on the basis of past experience, i.e. how have people with ‘similar’ histories behaved with respect to paying off their loans, or have people with ‘similar’ medical characteristics been shown to have the disease. To determine such things, data can be collected on credit or medical characteristics - along with the results - and analysed to see if it is possible to classify someone as a good or bad risk (for defaulting on the loan or having the disease). We can look at the process as one of finding ‘features’ in the data that can be used to build a ‘classifier’. In so doing we hope to determine which features are important for the classification, and which values of those features lead to success or failure. Once such values are determined based on the known cases, we have a rule or “classifier” that can be used to predict the outcome/result for new cases. You may recall that in clustering (unsupervised learning) we saw that sometimes the clustering was equivalent to the classification. This was due to the values of the features for clustering being the same as would be needed for classification. When we do a classification, we need to be concerned with how good or “accurate” our method is. In the medical situation, if a person who has a disease is classified as not having it, the person could die. On the other hand, if the person who does not have the disease is classified as having it, a needless operation could result. Thus the seriousness of the misclassification is something we also wish to consider. ©Mills 2013 Logistic/kNN/LDA/QDA - 1
45

Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Jul 20, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Section 8 Classification

Supervised Learning

Classification is a form ofsupervised learning - where we have prior information on what the result (orclass/group) should look like. There are many situations in which classification or discrimination of data isrequired.

For example,� when you apply for a loan, the lender will look at your credit, employment, and residence history

in order to detemine if you qualify for a loan;� if you are suspected of having a particular disease the doctor will take your temperature, blood

pressure, check various other things - probably have some tests done - and then, based on allthedata collected, determine the presence or absence of the disease.

In these (and many other situations) the decision is made on the basis of past experience, i.e. how havepeople with ‘similar’ histories behaved with respect to paying off their loans, or have people with ‘similar’medical characteristics been shown to have the disease.

To determine such things, data can be collected on credit or medical characteristics - along with the results- and analysed to see if it is possible to classify someone as a good or bad risk(for defaulting on the loan orhaving the disease). We can look at the process as one of finding ‘features’ in the data that can be used tobuild a ‘classifier’. In so doing we hope to determine which features are important for the classification,and which values of those features lead to success or failure. Once such valuesare determined based on theknown cases, we have a rule or “classifier” that can be used to predict the outcome/result for new cases.

You may recall that inclustering (unsupervised learning) we saw that sometimes the clustering wasequivalent to the classification. This was due to the values of the features for clustering being the same aswould be needed for classification.

When we do a classification, we need to be concerned with how good or “accurate” ourmethod is. In themedical situation, if a person who has a disease is classified as not having it, the person could die. On theother hand, if the person who does not have the disease is classified as having it, a needless operation couldresult. Thus the seriousness of the misclassification is something we alsowish to consider.

©Mills 2013 Logistic/kNN/LDA/QDA - 1

Page 2: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Logistic regression

Logistic regression may be used to classify data into one of two groups where

Y �0 if we are in group 1

1 if we are in group 2

and whereP�Y � 1� � p (called “success”) andP�Y � 0� � q � �1 � p� (called “failure”). Y isbinomial(1,p) i.e.Y is binary and discrete. If we useY as the response variable, we violate the normalityassumption associated with ordinary least squares linear regression (OLS). We note however, that

E�Y� � 0 � P�Y � 0� � 1 � P�Y � 1� � p

Notep � �0,1� and is continuous. This gives us a hint on how we might proceed. Often we assume thatthis probabilityp can be modelled by a sigmoidal curve of the form

p � E�Y� � 1 � e���0��1X1�...��kXk� �1

We now linearize this logistic response function by modelling theloge odds, (called thelogit) which is theloge of the ratio of p, the probability of “success”, toq � 1 � p, the probability of “failure”.i.e.,

�� � logepq � �0 � �1X1 � ... � �kXk

Since�� � ���,��, this is now formulated as a linear regression problem.

Once estimates for the�’s are obtained, it is possible to estimate the chance of “success”p as

�p � 1 � exp�����0 �

��1X1 � ... � �� kXk��

�1

� 11 � exp���

��0 �

��1X1 � ... � �� kXk��

Once�p exceeds some threshold (say 50%), we can decide to predictY � 1 (i.e. a “success”) for that case.

Logistic/kNN/LDA/QDA - 2 ©Mills 2013

Page 3: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

We can assess how well our model fits by examiningdeviance. The deviance of a fitted model comparesthelog-likelihood of the fitted model to thelog-likelihood of a model with n parameters that fits thenobservations perfectly (i.e. asaturated model.).

The following function computes the predicted value forp :

� p. hat �- function( object, X) {

� X �- as. matrix( X)

� term �- cbind( rep( 1, dim( X)[ 1]), X)%*%as. matrix( object$coefficients)

� 1/( 1�exp(- term))

� }

� drive �- ” D:”

� code. dir �- paste( drive, ” DATA/ Data Mining R- Code”, sep�”/”)

� data. dir �- paste( drive, ” DATA/ Data Mining Data”, sep�”/”)

We read in the required functions. For a couple of functions that have not been seen before, the code isshown below:

� source( paste( code. dir, ” MakeSyn. r”, sep�”/”))

� source( paste( code. dir, ” 3DRotations. r”, sep�”/”))

� source( paste( code. dir, ” Pt_to_Array. r”, sep�”/”))

� source( paste( code. dir, ” CreateGaussians. r”, sep�”/”))

� source( paste( code. dir, ” CreateGrid. r”, sep�”/”))

� source( paste( code. dir, ” confusion_ex. r”, sep�”/”))

� source( paste( code. dir, ” example_display. r”, sep�”/”))

Example: Data coming from two Gaussian clouds where there is no mixing

� # Cloud 1

� sz. 1 �- 500

� data. 1 �- Make. Gaussian( sz. 1, 0, 0, varx�5, vary�3, pi/ 4)

� data. 1 �- add. arr. vec( data. 1, c( 3, 15))

� # Cloud 2

� sz. 2 �- 400

� data. 2 �- Make. Gaussian( sz. 2, 0, 0, varx�5, vary�3, pi/ 4)

� data. 2 �- add. arr. vec( data. 2, c( 10,- 5))

� G. 1. data �- rbind( data. 1, data. 2)

� dimnames( G. 1. data) �- list( NULL, c(” X”, ” Y”))

� plot( G. 1. data, col�c( rep( 0, sz. 1), rep( 1, sz. 2)), asp�1)

©Mills 2013 Logistic/kNN/LDA/QDA - 3

Page 4: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Figure 1. Two clouds Figure 2. Two clouds with separating line

Logistic/kNN/LDA/QDA - 4 ©Mills 2013

Page 5: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Logistic regression uses theglm function (generalized linear models) that we used earlier for OLSregression. We tell it to do logistic withquasibinomial(link � ”logit”) :

� ( log. 1 �- glm( c( rep( 0, sz. 1), rep( 1, sz. 2))~ G. 1. data[, 1] �G. 2. data[, 2], quasibinomial( link �

” logit”)))

Call: glm( formula � c( rep( 0, sz. 1), rep( 1, sz. 2)) ~ G. 1. data[, 1] � G. 1. data[, 2],family � quasibinomial( link � ” logit”))

Coefficients:( Intercept) G. 1. data[, 1] G. 1. data[, 2]

- 46. 17 32. 66 - 28. 96Degrees of Freedom: 899 Total ( i. e. Null); 897 ResidualNull Deviance: 1237Residual Deviance: 7. 072e- 07 AIC: NA

(Because we are creating the Gaussians on the fly, the coefficients etc.will differ.) We can get moreinformation with:

� summary( log. 1)Call:glm( formula � c( rep( 0, sz. 1), rep( 1, sz. 2)) ~ G. 1. data[, 1] �

G. 1. data[, 2], family � quasibinomial( link � ” logit”))Deviance Residuals:

Min 1Q Median 3Q Max- 5. 415e- 04 - 2. 107e- 08 - 2. 107e- 08 2. 107e- 08 6. 236e- 04Coefficients:

Estimate Std. Error t value Pr( �| t|)( Intercept) - 46. 172634 0. 003226 - 14312 �2e- 16 ***G. 1. data[, 1] 32. 659748 0. 001885 17331 �2e- 16 ***G. 1. data[, 2] - 28. 960074 0. 001626 - 17810 �2e- 16 ***---Signif. codes: 0 ‘***’ 0. 001 ‘**’ 0. 01 ‘*’ 0. 05 ‘.’ 0. 1 ‘ ’ 1( Dispersion parameter for quasibinomial family taken to be 7. 373047e- 13)

Null deviance: 1. 2365e�03 on 899 degrees of freedomResidual deviance: 7. 0719e- 07 on 897 degrees of freedomAIC: NANumber of Fisher Scoring iterations: 25

and show the separating line (Figure 2):

� abline(- coef�c( log. 1[[ 1]][ 1]/ log. 1[[ 1]][ 3],- log. 1[[ 1]][ 2]/ log. 1[[ 1]][ 3]))

©Mills 2013 Logistic/kNN/LDA/QDA - 5

Page 6: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Because we wish to do several logistic regressions (and other types of classifications), we want a fairlygeneral set of routines for displaying classifications of two dimensional data.

The code inCreateGrid.r :� #��������������������������������������

� # Set up the points at which to determine the values

� #��������������������������������������

� f. create. grid �- function( data, numb. xy. pts) {

� x. min �- floor( min( data[, 1]))

� x. max �- ceiling( max( data[, 1]))

� y. min �- floor( min( data[, 2]))

� y. max �- ceiling( max( data[, 2]))

� xp �- seq( x. min, x. max, by�(( x. max- x. min)/ numb. xy. pts[ 1]))

� yp �- seq( y. min, y. max, by�(( y. max- y. min)/ numb. xy. pts[ 2]))

� zp �- expand. grid( xp, yp)

� list ( xp, yp, zp)

� }

f.create.grid takes a two-dimensional dataset, finds the bounding box for ‘X’ and ‘Y’, and creates agrid which will cover the entire dataset:

� plot. class. boundary �- function( train. data, train. class, test. data, test. class,

� model. exp, predict. exp, numb. xy. pts, extra�0) {

� numb. classes �- length( unique( train. class))

� model �- eval( model. exp)

� res �- f. create. grid( train. data, numb. xy. pts)

� xp �- res$xp

� yp �- res$yp

� points �- res$zp

� dimnames( points)[[ 2]] �- dimnames( train. data)[[ 2]]

� Z �- eval( predict. exp)

� points( points[, 1], points[, 2], pch�”.”, col� ( matrix( Z, length( xp), length( yp)) �2))

� for ( c in 1: numb. classes) {

� contour( xp, yp, matrix( as. numeric( Z��( c- 1)), length( xp), length( yp)),

� add�T, levels�0. 5, labex�0, col�1)

� }

� return ( model)

� }

plot.class.boundary (in the file example_display.r) is designed to show the boundaries betweentwo or more classes. It takes the training data (together with their corresponding classes) along withexpressions for the model and prediction. The model expression (see below) is evaluated to create a model;then the model is applied to predict the class for every point on the grid which has been created to cover thetraining data. This produces aZ value for each grid point and thecontour routine is used to draw curves

Logistic/kNN/LDA/QDA - 6 ©Mills 2013

Page 7: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

between the classes. The grid points are displayed in the colour of their predicted class.

©Mills 2013 Logistic/kNN/LDA/QDA - 7

Page 8: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Two new functions which appear inplot.class.boundary areeval andexpression . evaltakes anexpression and evaluates it.

As a simple example

� x �- 10

� ex �- expression( x^2)

� eval( ex)[ 1] 100

This enables us to pass information to a routine so that it can do computation basedon differentclassification methods that may be used.

In the following, the

expression(glm(factor(class)~data[,1] �data[,2], quasibinomial(link �

”logit”)))

and

expression((p.hat(model, zp) � 0.5)*1)

allow us to create expressions we can evaluate later. Note that the argumentsare valid R statements.

Logistic/kNN/LDA/QDA - 8 ©Mills 2013

Page 9: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

We will also make use of a function that will set up the arguments for the plotting as well as displayinginformation about the accuracy of the classification.� example. display �- function( train. data, train. class, test. data, test. class,

� numb. xy. pts, title, model. exp, predict. exp, extra�{}){

� # Plot the data ( coloured numbers)

� plot. class( train. data, train. class, test. data, test. class, main�title)

� numb. classes �- length( unique( train. class))

� model �- plot. class. boundary( train. data, train. class, test. data, test. class,

� model. exp, predict. exp, numb. xy. pts, extra�extra)

� cat(” ����� The model is �����\ n”)

� print( model)

� cat(”\ n����� Training �����\ n”)

� points �- train. data

� pred. train. class �- eval( predict. exp)

� print( confusion. expand( pred. train. class, train. class))

� # Print the total number of incorrect classifications

� misclass. train. ind �- which( pred. train. class! �train. class)

� cat(”\ n����� Misclassified training cases �����\ n”)

� if ( length( misclass. train. ind) � 0) {

� tmp �- rbind( train. data[ misclass. train. ind,])

� rownames( tmp) �- misclass. train. ind

� # Print the incorrect classes

� print( tmp)

� # Display the incorrect classes in blue

� points( tmp, col�” blue”, pch�24)

� } else {

� cat(” No misclassified cases\ n”)

� }

� if ( length( test. data) � 0) {

� cat(”\ n����� Test �����\ n”)

� points �- test. data

� pred. test. class �- eval( predict. exp)

� ind. tmp �- (( points- 1) ! � test. class)*( 1: length( test. class))

� print( confusion. expand( pred. test. class, test. class))

� misclass. test. ind �- which( pred. test. class! �test. class)

� cat(”\ n����� Misclassified test cases �����\ n”)

� if ( length( misclass. test. ind) � 0) {

� tmp �- rbind( test. data[ misclass. test. ind,])

� rownames( tmp) �- misclass. test. ind

� # Print the incorrect classes

� print( tmp)

� # Display the incorrect classes in blue

� points( tmp, col�” blue”, pch�25)

� } else {

� cat(” No misclassified cases\ n”)

� }

� }

� cat(” �����������������������������������\ n\ n”)

� bringToTop( which � dev. cur(), stay � FALSE)

� model

� }

©Mills 2013 Logistic/kNN/LDA/QDA - 9

Page 10: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

In example.display the data is plotted as numbers with both the colour and the number indicatingthe class. Theplot.class.boundary is used and information about the misclassification rate iscomputed. Points that are misclassified in thetraining set are displayed as blue� and in thetest set as blue�.

Note that this logistic regression can only distinguish between two classes.

We can repeat the example using our new display function:

� lr. 1 �- example. display( G. 1. data[, 1: 2], c( rep( 0, sz. 1), rep( 1, sz. 2)), {}, {},

� c( 100, 100), ” Logistic Regression”, model. exp, predict. exp)

����� The model is �����Call: glm( formula � factor( train. class) ~ train. data[, 1] � train. data[, 2],family � quasibinomial( link � ” logit”))Coefficients:

( Intercept) train. data[, 1] train. data[, 2]12. 302 4. 808 - 8. 847

Degrees of Freedom: 899 Total ( i. e. Null); 897 ResidualNull Deviance: 1237Residual Deviance: 7. 767e- 08 AIC: NA����� Training �����

trueobject 0 1 | Row Sum

0 500 0 | 5001 0 400 | 400------- ---- ---- | ----Col Sum 500 400 | 900

attr(,” error”)[ 1] 0����� Misclassified training cases �����No misclassified cases�����������������������������������Warning message:algorithm did not converge in: glm. fit( x � X, y � Y, weights � weights, start � start,etastart � etastart,

We can extract details about the model to use for other computations.

� lr. 1$coefficients( Intercept) train. data[, 1] train. data[, 2]

12. 302346 4. 808427 - 8. 846906� lr. 1$df. null[ 1] 899� lr. 1$df. residual[ 1] 897� lr. 1$null. deviance[ 1] 1236. 531� lr. 1$deviance[ 1] 7. 766595e- 08

Logistic/kNN/LDA/QDA - 10 ©Mills 2013

Page 11: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Figure 3. Logistic regression on two

non-overlapping clouds

Figure 4. Logistic regression on two

overlapping clouds

©Mills 2013 Logistic/kNN/LDA/QDA - 11

Page 12: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Example: Data coming from two Gaussian clouds where there is mixing� # Cloud 3

� sz. 3 �- 500

� data. 3 �- Make. Gaussian( sz. 3, 0, 0, varx�5, vary�3, pi/ 4)

� data. 3 �- add. arr. vec( data. 3, c( 3, 10))

� # Cloud 4

� sz. 4 �- 400

� data. 4 �- Make. Gaussian( sz. 4, 0, 0, varx�5, vary�3, pi/ 4)

� data. 4 �- add. arr. vec( data. 4, c( 10, 3))

� G. 2. data �- rbind( data. 3, data. 4)

� dimnames( G. 2. data) �- list( NULL, c(” X”, ” Y”))

� plot( G. 2. data, col�c( rep( 2, sz. 3), rep( 3, sz. 4)), asp�1)

� example. display( G. 2. data[, 1: 2], c( rep( 0, sz. 3), rep( 1, sz. 4)), {}, {},

� c( 100, 100), ” Logistic Regression”, model. exp, predict. exp)

����� The model is �����Call: glm( formula � factor( train. class) ~ train. data[, 1] � train. data[, 2],family � quasibinomial( link � ” logit”))Coefficients:

( Intercept) train. data[, 1] train. data[, 2]- 0. 4271 0. 8071 - 0. 7772

Degrees of Freedom: 899 Total ( i. e. Null); 897 ResidualNull Deviance: 1237Residual Deviance: 208. 2 AIC: NA����� Training �����

trueobject 0 1 | Row Sum

0 477 21 | 4981 23 379 | 402------- ---- ---- | ----Col Sum 500 400 | 900

attr(,” error”)[ 1] 0. 04888889����� Misclassified training cases �����

X Y48 5. 4911686 4. 142508550 15. 4088414 7. 857507466 4. 0186886 1. 845492075 5. 7043209 4. 7580690111 13. 4591102 10. 7516531118 5. 0495322 3. 4891509155 12. 1454569 11. 4279235157 3. 7994667 2. 9723688169 6. 3458236 3. 1246520217 10. 9644485 10. 4391524230 4. 4315879 3. 8026375255 5. 5082669 3. 7672967270 11. 7821399 8. 3303834280 5. 0310650 1. 9771039308 7. 9011613 7. 5800988330 - 0. 5432143 - 4. 5646499333 7. 4261595 6. 8082825350 2. 9290850 0. 4318286361 3. 8990941 3. 0699248374 10. 0981921 8. 3180937387 7. 6244109 6. 9869607397 8. 8820559 7. 8856633444 11. 7238871 5. 5299141510 6. 9309741 7. 7459358

Logistic/kNN/LDA/QDA - 12 ©Mills 2013

Page 13: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

515 9. 3660930 11. 7236550548 8. 7247443 9. 5905971554 4. 3385308 4. 4011452570 6. 0196786 7. 1145228579 12. 6711014 13. 7630291617 6. 0946587 6. 4027005634 3. 3333430 7. 5160789652 7. 1157056 8. 4394958682 3. 0344006 5. 9903613686 4. 4527196 5. 4494459703 4. 0413365 4. 1797645707 5. 1873916 5. 6960949711 2. 3816153 2. 2967289754 2. 3008701 5. 2234941758 5. 9355376 7. 4624286770 8. 4230113 10. 2583963814 6. 5557662 6. 7154571815 5. 6222459 5. 9832852841 10. 0373736 9. 9287212842 8. 9475897 9. 0953530�����������������������������������Call: glm( formula � factor( train. class) ~ train. data[, 1] � train. data[, 2],family � quasibinomial( link � ” logit”))Coefficients:

( Intercept) train. data[, 1] train. data[, 2]- 0. 4271 0. 8071 - 0. 7772

Degrees of Freedom: 899 Total ( i. e. Null); 897 ResidualNull Deviance: 1237Residual Deviance: 208. 2 AIC: NA

Note the 44 misclassified points in Figure 4.

Because Logistic regression can only distinguish between two classes, weneed to apply it twice to separatethree classes.

©Mills 2013 Logistic/kNN/LDA/QDA - 13

Page 14: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Example: Three Gaussian Clouds� # Cloud 5

� sz. 5 �- 600

� data. 5 �- Make. Gaussian( sz. 5, 0, 0, varx�6, vary�2, pi/ 3)

� data. 5 �- add. arr. vec( data. 5, c(- 10, 15))

� G. 3. data �- rbind( data. 3, data. 4, data. 5)

� dimnames( G. 3. data) �- list( NULL, c(” X”, ” Y”))

� plot( G. 3. data, col�c( rep( 3, sz. 3), rep( 2, sz. 4), rep( 4, sz. 5)), asp�1)

Figure 5. Three clouds

For the first separating line:

� example. display( G. 3. data[, 1: 2], c( rep( 1, sz. 3), rep( 1, sz. 4), rep( 0, sz. 5)), {}, {},

� c( 100, 100), ” Logistic Regression”, model. exp, predict. exp)

����� The model is �����Call: glm( formula � factor( train. class) ~ train. data[, 1] � train. data[, 2],family � quasibinomial( link � ” logit”))Coefficients:

( Intercept) train. data[, 1] train. data[, 2]29. 705 2. 799 - 1. 308

Degrees of Freedom: 1499 Total ( i. e. Null); 1497 ResidualNull Deviance: 2019Residual Deviance: 11. 54 AIC: NA����� Training �����

trueobject 0 1 | Row Sum

0 599 0 | 5991 1 900 | 901------- ---- ---- | ----Col Sum 600 900 | 1500

attr(,” error”)[ 1] 0. 0006666667

����� Misclassified training cases �����X Y

1461 - 7. 302604 4. 480134

Logistic/kNN/LDA/QDA - 14 ©Mills 2013

Page 15: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

�����������������������������������Call: glm( formula � factor( train. class) ~ train. data[, 1] � train. data[, 2],family � quasibinomial( link � ” logit”))Coefficients:

( Intercept) train. data[, 1] train. data[, 2]29. 705 2. 799 - 1. 308

Degrees of Freedom: 1499 Total ( i. e. Null); 1497 ResidualNull Deviance: 2019Residual Deviance: 11. 54 AIC: NA

Figure 6. Logistic regression separating

one cloud

Figure 7. Logistic regression separating

other cloud

For the second separation:

� example. display( G. 3. data[, 1: 2], c( rep( 1, sz. 3), rep( 0, sz. 4), rep( 1, sz. 5)), {}, {},

� c( 100, 100), ” Logistic Regression”, model. exp, predict. exp)

����� The model is �����Call: glm( formula � factor( train. class) ~ train. data[, 1] � train. data[, 2],family � quasibinomial( link � ” logit”))Coefficients:

( Intercept) train. data[, 1] train. data[, 2]0. 4271 - 0. 8071 0. 7772

Degrees of Freedom: 1499 Total ( i. e. Null); 1497 ResidualNull Deviance: 1740Residual Deviance: 208. 2 AIC: NA����� Training �����

trueobject 0 1 | Row Sum

0 379 23 | 4021 21 1077 | 1098------- ---- ---- | ----Col Sum 400 1100 | 1500

attr(,” error”)[ 1] 0. 02933333

©Mills 2013 Logistic/kNN/LDA/QDA - 15

Page 16: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

����� Misclassified training cases �����X Y

48 5. 4911686 4. 142508550 15. 4088414 7. 857507466 4. 0186886 1. 845492075 5. 7043209 4. 7580690111 13. 4591102 10. 7516531118 5. 0495322 3. 4891509155 12. 1454569 11. 4279235157 3. 7994667 2. 9723688169 6. 3458236 3. 1246520217 10. 9644485 10. 4391524230 4. 4315879 3. 8026375...682 3. 0344006 5. 9903613686 4. 4527196 5. 4494459703 4. 0413365 4. 1797645707 5. 1873916 5. 6960949711 2. 3816153 2. 2967289754 2. 3008701 5. 2234941758 5. 9355376 7. 4624286770 8. 4230113 10. 2583963814 6. 5557662 6. 7154571815 5. 6222459 5. 9832852841 10. 0373736 9. 9287212842 8. 9475897 9. 0953530�����������������������������������

Call: glm( formula � factor( train. class) ~ train. data[, 1] � train. data[, 2],family � quasibinomial( link � ” logit”))Coefficients:

( Intercept) train. data[, 1] train. data[, 2]0. 4271 - 0. 8071 0. 7772

Degrees of Freedom: 1499 Total ( i. e. Null); 1497 ResidualNull Deviance: 1740Residual Deviance: 208. 2 AIC: NA

Logistic/kNN/LDA/QDA - 16 ©Mills 2013

Page 17: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Synthetic ExamplesFor all our problems in classification we will use some synthetic data setsto show how the differentclassifiers perform in different situations.

The data sets are created by the functioncreate.synthetic:

� #������������������������������

� # Create synthetic classes

� #������������������������������

� create. synthetic �- function( data. size) {

� # Set up samples

� X �- runif( data. size, - 1, 1)

� Y �- runif( data. size, - 1, 1)

� data �- cbind( X, Y) # create an array of them

� # Set the target values

� tp �- matrix( 0, 100, 8)

� tp[, 1] �- ( X[] ��0)* 1 # 1 LHP, 0 otherwise

� tp[, 2] �- ( X[]* Y[] ��0)* 1 # 1 1st&4th Quad, 0 otherwise

� tp[, 3] �- ( X[] ��0)* 1� ( X[] ��0 & Y[] ��0)* 1 # 2 1st Quad, 1, 4th quad, 0 otherwise

� tp[, 4] �- ( X[] ��- Y[])* 1 # 1 to right of y � - x, 0 otherwise

� tp[, 5] �- ( X[] ��- Y[])&( X[] ��Y[])|( X[] ��- Y[])&( X[] ��Y[])* 1 #

� tp[, 6] �- ( X[] ��- Y[])* 1� (( X[] ��- Y[]) & X[] ��Y[])* 1 # rotated 2 1st Quad, 1, 4th quad, 0

� tp[, 7] �- ( X[]^ 2�Y[]^ 2��. 25)* 1 # 1 to right of y � - x, 0 otherwise

� tp[, 8] �- ( Y[] ��X[]^ 2�. 25)* 1 # 1 to above of y � x^2, 0 otherwise

� return( list( tp, data))

� }

� data. sz �- 500

� res �- create. synthetic( data. sz)

� for ( i in 1: 8) {

� plot. class( res$data, res$tp[, i])

� readline(” Press Enter...”)

� }

©Mills 2013 Logistic/kNN/LDA/QDA - 17

Page 18: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Figure 8. Synthetic Classes

We will split the data into atraining set and atest set. The function to produce those indices NOT in a setis:

� ”%w/ o%” �- function( x, y) x[! x %in% y]

We also use the splitting routine:� #��������������������������������������������

� # Set the indices for the training/ test sets

� #��������������������������������������������

� get. train �- function ( data. sz, train. sz) {

� # Take subsets of data for training/ test samples

� # Return the indices

� train. ind �- sample( data. sz, train. sz)

� test. ind �- ( 1: data. sz) %w/ o% train. ind

� list( train�train. ind, test�test. ind)

� }

Create a random split:� tt. ind �- get. train( data. sz, 400)

and save the results for replication (the first time).� # save( tt. ind, file�paste( data. dir, ” synTrainTest. Rdata”, sep�””))

We can then recover the appropriate values:� load( paste( data. dir, ” synTrainTest. Rdata”, sep�””))

� syn. train. class �- res$tp[ tt. ind[[ 1]],]

Logistic/kNN/LDA/QDA - 18 ©Mills 2013

Page 19: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

� syn. train. data �- res$data[ tt. ind[[ 1]],]

� syn. test. class �- res$tp[ tt. ind[[ 2]],]

� syn. test. data �- res$data[ tt. ind[[ 2]],]

� f. menu �- function(){

� while ( TRUE) {

� cat(” Enter the number of the data set”,”\ n”)

� cat (” 0 to quit\ n”)

� resN �- menu( c(” 1: 2”, ” 1/ 3: 2/ 4”, ” 1: 2/ 3: 4”, ” 1: 2 rotated”, ” 1/ 3: 2/ 4 rotated”,

� ” 1: 2/ 3: 4 rotated”, ” Hole”, ” x^2”))

� if ( resN �� 0) break

� example. display( syn. train. data, syn. train. class[, resN],

� syn. test. data, syn. test. class[, resN],

� c( 100, 100), ” Logistic Regression”, model. exp,

� predict. exp)

� }

� }

Now run the examples:� f. menu()

Figure 9. Logistic regression separating

half planes

Figure 10. Logistic regression does not

work for quarter planes

©Mills 2013 Logistic/kNN/LDA/QDA - 19

Page 20: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Figure 11. Logistic regression does not

work for three classes

Figure 12. Logistic regression does

a good job

Figure 13. Logistic regression does

not work well

Figure 14. Logistic regression does not

work for three classes

Logistic/kNN/LDA/QDA - 20 ©Mills 2013

Page 21: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Figure 15. Logistic regression does

not work well

Figure 16. Logistic regression does

not work well

©Mills 2013 Logistic/kNN/LDA/QDA - 21

Page 22: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

k-NN (k Nearest Neighbours)

k-nearest neighbour classification is done for a test dataset using a training set. For data in the test set,thek nearest (in Euclidean distance) training set vectors are found, and the classification of a value inthe test set is decided by majority vote of the training set vectors, withties broken at random. If there areties for thekth nearest vector, all candidates are included in the vote.

Demonstration of what k-NN is all about:

We will create 3 classes from random normal data with� � 1 and centers at�3,3�, �5,5�, and�5,3� :

� #��������������������������������������

� # Create classes from random normal

� # Call - f. create. data( 3, c( 50, 60, 70), list( c( 3, 3), c( 5, 5), c( 5, 3)))

� #��������������������������������������

� f. create. data �- function( numb. classes, n. in. class, centre. class) {

� total. n. in. classes �- sum( n. in. class)

� data �- matrix( 0, total. n. in. classes, 3)

� start �- 1

� for ( i in ( 1: numb. classes)) {

� end �- start � n. in. class[ i] - 1

� pos �- start: end

� data[ pos, 1] �- rnorm( n. in. class[ i], 0, 1) � centre. class[[ i]][ 1]

� data[ pos, 2] �- rnorm( n. in. class[ i], 0, 1) � centre. class[[ i]][ 2]

� data[ pos, 3] �- rep( i, n. in. class[ i])

� start �- end � 1

� }

� data

� }

Draw a circle on the graph:� draw. circle �- function( r, x0, y0, c) {

� t �- seq( 0, 2* pi, by�. 01)

� x �- x0 � r* cos( t)

� y �- y0 � r* sin( t)

� lines ( x, y, col�c)

� }

� my. k. nn �- function( k, data, grid) {

� a. min �- min( min( grid[[ 1]]), min( grid[[ 2]]))

� a. max �- max( max( grid[[ 1]]), max( grid[[ 2]]))

� plot( data[, 1], data[, 2], xlim�c( a. min, a. max), ylim�c( a. min, a. max),

� col � data[, 3])

� dist �- matrix( 0, 2, k);

� numb. classes �- length( unique( data[, 3]))

� curr. class �- matrix( 0, 1, numb. classes)

� p. class �- matrix( 0, 1, dim( grid[[ 3]])[[ 1]])

� #

� # Draw a circle in white - doesn’ t show.

� # Set the x, y, and radius i. e. max. old

� #

Logistic/kNN/LDA/QDA - 22 ©Mills 2013

Page 23: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

� draw. circle( 0, 0, 0, ’ white’)

� x. old �- 0; y. old �- 0; max. old �- 0

� #

� for ( j in ( 1:( dim( grid[[ 3]])[[ 1]]))){ ## run through all the test points grid

� for ( i in 1: dim( data)[[ 1]]) {

� currdist �- ( data[ i, 1]- grid[[ 3]][ j, 1])*( data[ i, 1]- grid[[ 3]][ j, 1]) �

� ( data[ i, 2]- grid[[ 3]][ j, 2])*( data[ i, 2]- grid[[ 3]][ j, 2]);

� if ( i �� k) { # Set the first k

� dist[ 1, i] �- currdist;

� dist[ 2, i] �- i;

� } # if ( i �� k)

� else {## go through the k list and replace the biggest one if possible

� max. d �- max( dist[ 1,])

� max. index �- which. max( dist[ 1,])

� if ( currdist � max. d) {

� dist[ 1, max. index] �- currdist;

� dist[ 2, max. index] �- i;

� }

� }

� }

� for ( cl in ( 1: numb. classes)) {

� curr. class[ cl] �- sum( data[, 3][ dist[ 2,]] �� cl)

� }

� # take a vote

� v. max �- curr. class��max( curr. class)

� v �- sum( v. max) # v � 1 no tie

� v. index �- which( v. max)

� p. class[ j] �- v. index[ floor( runif( 1)/ v) �1]

� draw. circle( max. old^(. 5), x. old, y. old, ’ white’)

� x. old �- grid[[ 3]][ j, 1]

� y. old �- grid[[ 3]][ j, 2]

� max. old �- max. d

� draw. circle( max. d^(. 5), grid[[ 3]][ j, 1], grid[[ 3]][ j, 2], ’ black’)

� points( data[, 1], data[, 2], col � data[, 3])

� points( grid[[ 3]][, 1], grid[[ 3]][, 2], col � p. class, pch�20)

� if ( abs( grid$zp[ j, 2] - 3. 8) � 0. 0001) {

� readline(” Press return”)

� }

� }

� contour( grid[[ 1]], grid[[ 2]], matrix( p. class, nrow�length( grid[[ 1]])),

� levels � 1:( numb. classes- 1) �. 5)

� points( data[, 1], data[, 2], col � data[, 3])

� points( grid[[ 3]][, 1], grid[[ 3]][, 2], col � p. class, pch�20)

� p. class

� }

� data �- f. create. data( 3, c( 50, 60, 70), list( c( 3, 3), c( 5, 5), c( 5, 3)))

� grid �- f. create. grid( data, c( 10, 10))

� my. class �- my. k. nn( 5, data, grid)

� library( class)

� library( mda)

©Mills 2013 Logistic/kNN/LDA/QDA - 23

Page 24: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Figure 17. All black in circle.

Black predicted.

Figure 18. What will be predicted?.

Figure 19. Red was predicted. Figure 20. Separating lines.

To illustrate, we could try a nearest neighbour method on our two-cloud data. We will determine theclassification of the points��6,3�,��5,3�, ...,�20,3� based on their 3 nearest neighbours.

� ( nn. 3 - � knn( G. 1. data[ , 1: 2], cbind(- 6: 20, rep( 3, 27)), c( rep( 0, sz. 1), rep( 1, sz. 2)), k�3))[ 1] 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1Levels: 0 1

It turns out that the result is given in the form of a factor, so in order to make use ofthe information (forcolouring points or doing numerical operations) we need to convert to numeric values.� unclass( nn. 3)[ 1] 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2attr(,” levels”)[ 1] ” 0” ” 1”� unclass( nn. 3)[ 1] � 2[ 1] 3

Logistic/kNN/LDA/QDA - 24 ©Mills 2013

Page 25: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Consider two classes:

� xa �- c(- 2,- 1, 1, 2)

� ya �- c(- 2,- 1, 1, 2)

� class �- c( 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0)

To use theexample.display , weusually need to set up the model (in this case there is no model) andthe predictor and run the examples:� model. exp �- expression({})

� predict. exp �- expression( unclass( knn( train. data, data. frame( points),

� train. class, k � extra)) - 1)

� graphics. off()

� oldpar �- par( mfrow�c( 2, 3))

� example. display( expand. grid( xa, ya), class, {}, {}, c( 100, 100), ” 1- NN”, model. exp,

� predict. exp, 1)

� example. display( expand. grid( xa, ya), class, {}, {}, c( 100, 100), ” 2- NN”, model. exp,

� predict. exp, 2)

� example. display( expand. grid( xa, ya), class, {}, {}, c( 100, 100), ” 3- NN”, model. exp,

� predict. exp, 3)

� example. display( expand. grid( xa, ya), class, {}, {}, c( 100, 100), ” 4- NN”, model. exp,

� predict. exp, 4)

� example. display( expand. grid( xa, ya), class, {}, {}, c( 100, 100), ” 5- NN”, model. exp,

� predict. exp, 5)

� example. display( expand. grid( xa, ya), class, {}, {}, c( 100, 100), ” 6- NN”, model. exp,

� predict. exp, 6)

� par( oldpar)

©Mills 2013 Logistic/kNN/LDA/QDA - 25

Page 26: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Figure 21. Use of different number of neighbours in 2-class case

Logistic/kNN/LDA/QDA - 26 ©Mills 2013

Page 27: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

We can also look at four classes:� xb �- c(- 2,- 1, 1, 2)

� yb �- c(- 2,- 1, 1, 2)

� class. b �- c( 0, 0, 2, 2, 0, 0, 2, 2, 1, 1, 3, 3, 1, 1, 3, 3)

� oldpar �- par( mfrow�c( 1, 3))

� example. display( expand. grid( xb, yb), class. b, {}, {}, c( 100, 100), ” 2- NN”, model. exp,

� predict. exp, 2)

� example. display( expand. grid( xb, yb), class. b, {}, {}, c( 100, 100), ” 3- NN”, model. exp,

� predict. exp, 3)

� example. display( expand. grid( xb, yb), class. b, {}, {}, c( 100, 100), ” 4- NN”, model. exp,

� predict. exp, 4)

� par( oldpar)

Figure 22. Use of different number of neighbours in 4-class case

Note the effect of the random selection of the nearest neighbours.

©Mills 2013 Logistic/kNN/LDA/QDA - 27

Page 28: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Compare with the logistic regression for the Gaussian clouds (both without mixing and with mixing):� graphics. off()

� oldpar �- par( mfrow�c( 2, 2))

� example. display( G. 1. data[, 1: 2], c( rep( 0, sz. 1), rep( 1, sz. 2)), {}, {},

� c( 100, 100), ” Logistic Regression”,

� expression( glm( factor( train. class)~ train. data[, 1] �train. data[, 2],

� quasibinomial( link � ” logit”))),

� expression(( p. hat( model, points) � 0. 5)* 1))

� example. display( G. 1. data[, 1: 2], c( rep( 0, sz. 1), rep( 1, sz. 2)), {}, {}, c( 100, 100), ” 3- NN”,

� expression({}), expression( unclass( knn( train. data, data. frame( points),

� train. class, k � extra)) - 1), 3)

� example. display( G. 2. data[, 1: 2], c( rep( 0, sz. 3), rep( 1, sz. 4)), {}, {},

� c( 100, 100), ” Logistic Regression”,

� expression( glm( factor( train. class)~ train. data[, 1] �train. data[, 2],

� quasibinomial( link � ” logit”))),

� expression(( p. hat( model, points) � 0. 5)* 1))

� example. display( G. 2. data[, 1: 2], c( rep( 0, sz. 3), rep( 1, sz. 4)), {}, {}, c( 100, 100), ” 3- NN”,

� expression({}), expression( unclass( knn( train. data, data. frame( points),

� train. class, k � extra)) - 1), 3)

� par( oldpar)

Figure 23. Comparison of logistic regression

and 3 nearest neighbours (no mixing on top)

Figure 24. Comparison of different number

of neighbours

Logistic/kNN/LDA/QDA - 28 ©Mills 2013

Page 29: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

When we look at the3 nearest neighbours for theoverlapping clouds, we see that the separating curve isvery twisty.This suggests that we are overfitting. Use of more neighbours will tend to smooth the curve.

� graphics. off()

� oldpar �- par( mfrow�c( 2, 2))

� example. display( G. 2. data[, 1: 2], c( rep( 0, sz. 3), rep( 1, sz. 4)), {}, {}, c( 100, 100), ” 5- NN”,

� expression({}), expression( unclass( knn( train. data, data. frame( points),

� train. class, k � extra)) - 1), 5)

� example. display( G. 2. data[, 1: 2], c( rep( 0, sz. 3), rep( 1, sz. 4)), {}, {}, c( 100, 100), ” 7- NN”,

� expression({}), expression( unclass( knn( train. data, data. frame( points),

� train. class, k � extra)) - 1), 7)

� example. display( G. 2. data[, 1: 2], c( rep( 0, sz. 3), rep( 1, sz. 4)), {}, {}, c( 100, 100), ” 9- NN”,

� expression({}), expression( unclass( knn( train. data, data. frame( points),

� train. class, k � extra)) - 1), 9)

� example. display( G. 2. data[, 1: 2], c( rep( 0, sz. 3), rep( 1, sz. 4)), {}, {}, c( 100, 100), ” 11- NN”,

� expression({}), expression( unclass( knn( train. data, data. frame( points),

� train. class, k � extra)) - 1), 11)

� par( oldpar)

We can also look at the effect of choice of neighbours on our synthetic data sets.As before, we create a simple menu function to enable us to run several examples:� f. menu �- function(){

� while ( TRUE) {

� cat(” Enter the number of the data set”,”\ n”)

� cat (” 0 to quit\ n”)

� resN �- menu( c(” 1: 2”, ” 1/ 3: 2/ 4”, ” 1: 2/ 3: 4”, ” 1: 2 rotated”, ” 1/ 3: 2/ 4 rotated”,

� ” 1: 2/ 3: 4 rotated”, ” Hole”, ” 1 rotated”))

� if ( resN �� 0) break

� size �- readline(”# of neighbours - 1 to 10: ”)

� KNN. ex( syn. train. data, syn. train. class[, resN], syn. test. data,syn. test. class[, resN]- 1,� c( 100, 100), as. integer( size))

� }

� }

Now run the examples:� f. menu()

©Mills 2013 Logistic/kNN/LDA/QDA - 29

Page 30: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Figure 25. Case 1, 1 nearest neighbour Figure 26. Case 1, 7 nearest neighbours

Figure 27. Case 2, 1 nearest neighbours Figure 28. Case 2, 5 nearest neighbours

Logistic/kNN/LDA/QDA - 30 ©Mills 2013

Page 31: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Figure 29. Case 5, 1 nearest neighbour Figure 30. Case 7, 1 nearest neighbour

Figure 31. Case 8, 1 nearest neighbour Figure 32. Case 8, 5 nearest neighbours

©Mills 2013 Logistic/kNN/LDA/QDA - 31

Page 32: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Linear and Quadratic Discriminant Analysis

Another method to discriminate (and hence classify) two groups (populations) of datais due to Sir R. A.Fisher. Fisher’s idea was to transform amultivariate observationx into aunivariate observationy such thatthey’s derived from populations�i � i � 1 or 2) were separated (or different) as much as possible. Fishersuggested usinglinear combinations of the variables inx to create they’s. We define

� i � E�X � �i�. for i � 1,2

and assume the covariance matrix ofX

� � E �X � � i� �X � � i�T , i � 1,2

is the same (i.e. common) for the two populations. Then

Y � lTX

where l is a vector of weights and� iY � lT� i and�Y2 � lT�l so

the squared distance between the averages ofY for the two populations

variance ofY�

��1Y � �2Y �2

�Y2

�lT� 2

lT�lwhere� � �1 � �2

This is maximized by setting

l � c��1��1 � �2�

for anyc � 0. Choosingc � 1 gives us

Y� lTX � ��1 � �2�T ��1 X

which is called Fisher’slinear discriminant (LDA) function. We use this as a classification tool.

Note that this function transforms the midpoint��1��2�2 between the two multivariate population means

� i, i � 1,2 intom (which is the average of the 2 univariate means forY), i.e.

Logistic/kNN/LDA/QDA - 32 ©Mills 2013

Page 33: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

��1 � �2�T ��1 ��1 � �2�

2� m

� 0.5��1Y � �2Y�

and any other pointx0 also gets transformed by this function. We assignx0 to �1(population 1� if

��1 � �2�T ��1x0 � m

and to�2 (population 2) if

��1 � �2�T ��1x0 � m

(This results in alinear separator for the 2 multivariate populations.)

Relaxing the assumption of a common covariance matrix for the two populations results in a modificationto LDA giving aquadratic discriminant rule (QDA) where we allocatex0 to �1 if

��1T�1

�1 � �2T�2

�1� x0 � 0.5 x0T��1

�1 � �2�1�x0 � 0.5��1

T�1�1�1 � �2

T�2�1�2�

� 0.5 ln |�1||�2|

and otherwise allocatex0 to �2 and hence we haveQDA which gives quadratic separation rather than linearseparation of the 2 populations.

We illustrate this in R using:� library( MASS)

As before, we can apply these to the cloud data:

� lda. c �- lda( G. 1. data[ , 1: 2], c( rep( 0, sz. 1), rep( 1, sz. 2)))Call:lda( G. 1. data[, 1: 2], grouping � c( rep( 0, sz. 1), rep( 1, sz. 2)))Prior probabilities of groups:

0 10. 5555556 0. 4444444Group means:

X Y0 3. 370369 14. 9047571 10. 021932 - 4. 662915Coefficients of linear discriminants:

LD1X 0. 1895896Y - 0. 2712415

©Mills 2013 Logistic/kNN/LDA/QDA - 33

Page 34: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

To find the classification of some points:

� predict( lda. c, cbind(- 6: 20, rep( 3, 27)))$class

[ 1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1Levels: 0 1$posterior

0 1[ 1,] 9. 999952e- 01 4. 759306e- 06[ 2,] 9. 999835e- 01 1. 653425e- 05[ 3,] 9. 999426e- 01 5. 743978e- 05[ 4,] 9. 998005e- 01 1. 995249e- 04[ 5,] 9. 993072e- 01 6. 928333e- 04[ 6,] 9. 975971e- 01 2. 402874e- 03[ 7,] 9. 917014e- 01 8. 298559e- 03[ 8,] 9. 717498e- 01 2. 825024e- 02[ 9,] 9. 082667e- 01 9. 173332e- 02

[ 10,] 7. 402577e- 01 2. 597423e- 01[ 11,] 4. 506523e- 01 5. 493477e- 01[ 12,] 1. 910226e- 01 8. 089774e- 01[ 13,] 6. 364205e- 02 9. 363580e- 01[ 14,] 1. 918853e- 02 9. 808115e- 01[ 15,] 5. 599786e- 03 9. 944002e- 01[ 16,] 1. 618307e- 03 9. 983817e- 01[ 17,] 4. 663539e- 04 9. 995336e- 01[ 18,] 1. 342808e- 04 9. 998657e- 01[ 19,] 3. 865533e- 05 9. 999613e- 01[ 20,] 1. 112693e- 05 9. 999889e- 01[ 21,] 3. 202822e- 06 9. 999968e- 01[ 22,] 9. 219085e- 07 9. 999991e- 01[ 23,] 2. 653641e- 07 9. 999997e- 01[ 24,] 7. 638291e- 08 9. 999999e- 01[ 25,] 2. 198620e- 08 1. 000000e�00[ 26,] 6. 328550e- 09 1. 000000e�00[ 27,] 1. 821622e- 09 1. 000000e�00$x

LD1[ 1,] - 1. 46685188[ 2,] - 1. 27726232[ 3,] - 1. 08767276[ 4,] - 0. 89808320[ 5,] - 0. 70849364[ 6,] - 0. 51890408[ 7,] - 0. 32931451[ 8,] - 0. 13972495[ 9,] 0. 04986461

[ 10,] 0. 23945417[ 11,] 0. 42904373[ 12,] 0. 61863329[ 13,] 0. 80822286[ 14,] 0. 99781242[ 15,] 1. 18740198[ 16,] 1. 37699154[ 17,] 1. 56658110[ 18,] 1. 75617067[ 19,] 1. 94576023[ 20,] 2. 13534979[ 21,] 2. 32493935[ 22,] 2. 51452891[ 23,] 2. 70411847[ 24,] 2. 89370804[ 25,] 3. 08329760

Logistic/kNN/LDA/QDA - 34 ©Mills 2013

Page 35: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

[ 26,] 3. 27288716[ 27,] 3. 46247672

and

� ( qda. c �- qda( G. 1. data[ , 1: 2], c( rep( 0, sz. 1), rep( 1, sz. 2))))Call:qda( G. 1. data[, 1: 2], grouping � c( rep( 0, sz. 1), rep( 1, sz. 2)))Prior probabilities of groups:

0 10. 5555556 0. 4444444Group means:

X Y0 3. 370369 14. 9047571 10. 021932 - 4. 662915

� predict( qda. c, cbind(- 6: 20, rep( 3, 27)))$class

[ 1] 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1Levels: 0 1$posterior

0 1[ 1,] 9. 999964e- 01 3. 589146e- 06[ 2,] 9. 999882e- 01 1. 179735e- 05[ 3,] 9. 999610e- 01 3. 895184e- 05[ 4,] 9. 998708e- 01 1. 291821e- 04[ 5,] 9. 995697e- 01 4. 302721e- 04[ 6,] 9. 985614e- 01 1. 438583e- 03[ 7,] 9. 951800e- 01 4. 820033e- 03[ 8,] 9. 839066e- 01 1. 609344e- 02[ 9,] 9. 474290e- 01 5. 257102e- 02

[ 10,] 8. 409810e- 01 1. 590190e- 01[ 11,] 6. 070657e- 01 3. 929343e- 01[ 12,] 3. 100116e- 01 6. 899884e- 01[ 13,] 1. 151040e- 01 8. 848960e- 01[ 14,] 3. 613422e- 02 9. 638658e- 01[ 15,] 1. 064147e- 02 9. 893585e- 01[ 16,] 3. 062689e- 03 9. 969373e- 01[ 17,] 8. 727400e- 04 9. 991273e- 01[ 18,] 2. 471881e- 04 9. 997528e- 01[ 19,] 6. 966540e- 05 9. 999303e- 01[ 20,] 1. 954309e- 05 9. 999805e- 01[ 21,] 5. 457519e- 06 9. 999945e- 01[ 22,] 1. 517171e- 06 9. 999985e- 01[ 23,] 4. 198698e- 07 9. 999996e- 01[ 24,] 1. 156740e- 07 9. 999999e- 01[ 25,] 3. 172480e- 08 1. 000000e�00[ 26,] 8. 661707e- 09 1. 000000e�00[ 27,] 2. 354235e- 09 1. 000000e�00

The expression for prediction is:� predict. exp �- expression( unclass( predict( model, points)$ class)- 1)

Look at the synthetic classes:� f. menu �- function(){

� while ( TRUE) {

� cat(” Enter the type of classification - 0 to stop”,”\ n”)

� res �- menu( c(” LDA”, ” QDA” ))

©Mills 2013 Logistic/kNN/LDA/QDA - 35

Page 36: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

� if ( res �� 0) break

� cat(” Enter the number of the data set”,”\ n”)

� resN �- menu( c(” 1: 2”, ” 1/ 3: 2/ 4”, ” 1: 2/ 3: 4”, ” 1: 2 rotated”, ” 1/ 3: 2/ 4 rotated”,

� ” 1: 2/ 3: 4 rotated”, ” Hole”, ” 1 rotated”))

� switch( res, example. display( syn. train. data, syn. train. class[, resN],

� syn. test. data, syn. test. class[, resN],

� c( 100, 100), ” LDA”, expression( lda( train. data, train. class)),

� predict. exp),

� example. display( syn. train. data, syn. train. class[, resN],

� syn. test. data, syn. test. class[, resN],

� c( 100, 100), ” QDA”, expression( qda( train. data, train. class)),

� predict. exp))

� }

� }

� f. menu()

Figure 33. LDA on example 1 Figure 34. LDA on example 2

Logistic/kNN/LDA/QDA - 36 ©Mills 2013

Page 37: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Figure 35. LDA on example 3 Figure 36. LDA on example 4

Figure 37. LDA on example 7 Figure 38. LDA on example 8

©Mills 2013 Logistic/kNN/LDA/QDA - 37

Page 38: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Figure 39. QDA on example 2 Figure 40. QDA on example 3

Figure 41. QDA on example 5 Figure 42. QDA on example 6

Logistic/kNN/LDA/QDA - 38 ©Mills 2013

Page 39: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Figure 43. QDA on example 7 Figure 44. QDA on example 8

©Mills 2013 Logistic/kNN/LDA/QDA - 39

Page 40: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Flea Beetles revisited(the Beetles revival!)

Let us take another look at the FLEA BEETLES - this problem is an example of classification into morethan 2 classes or groups.

If we apply LDA and QDA to the Flea Beetles we find that LDA misclassifies one case while QDA is ableto classify all correctly.

� source( paste( code. dir, ” ReadFleas. r”, sep�”/”))

First we will do LDA on the data:

� flea. lda �- LorQDA. ex( data. frame( d. flea[, 6], d. flea[, 1]), species- 1,{},{}, c( 100, 100), ” L”)����� The model is �����Call:lda( train. data, train. class)Prior probabilities of groups:

0 1 20. 2837838 0. 2972973 0. 4189189Group means:d. flea... 6. d. flea... 1.

0 104. 8571 183. 09521 106. 5909 138. 22732 81. 0000 201. 0000Coefficients of linear discriminants:

LD1 LD2d. flea... 6. - 0. 09862900 0. 09774785d. flea... 1. 0. 06826833 0. 04321963Proportion of trace:

LD1 LD20. 9361 0. 0639����� Training �����

trueobject 0 1 2 | Row Sum

0 21 0 1 | 221 0 22 0 | 222 0 0 30 | 30------- ---- ---- ---- | ----Col Sum 21 22 31 | 74

attr(,” error”)[ 1] 0. 01351351����� Misclassified training cases �����

d. flea... 6. d. flea... 1.65 92 185�����������������������������������

Logistic/kNN/LDA/QDA - 40 ©Mills 2013

Page 41: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Figure 45. LDA separation of flea beetlesFigure 46. QDA separation of flea beetles

Now we will do QDA on the same data:

� flea. qda �- LorQDA. ex( data. frame( d. flea[, 6], d. flea[, 1]), species- 1, {}, {}, c( 100, 100),” Q”)����� The model is �����Call:qda( train. data, train. class)Prior probabilities of groups:

0 1 20. 2837838 0. 2972973 0. 4189189Group means:d. flea... 6. d. flea... 1.

0 104. 8571 183. 09521 106. 5909 138. 22732 81. 0000 201. 0000����� Training �����

trueobject 0 1 2 | Row Sum

0 21 0 0 | 211 0 22 0 | 222 0 0 31 | 31------- ---- ---- ---- | ----Col Sum 21 22 31 | 74

attr(,” error”)[ 1] 0����� Misclassified training cases �����No misclassified cases�����������������������������������

So far our examples have been in two dimensions. We could look at the flea beetles in six dimensions but,in order for us to visualize the results, we will restrict ourselves to three dimensions (using variables 1, 4,and 6).

The following function will determine the points at which the classes change in three dimensional spacebased on a classification model. We create a model (LDA or QDA) in the three dimensional space. Theidea is to then create a sequence of 30 grids on planes perpendicular to the d.flea[,4] direction and, on each

©Mills 2013 Logistic/kNN/LDA/QDA - 41

Page 42: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

of these planes, use the model to compute the class at each of the grid points (as wasdone in theclassification examples). Then we use thediff function along each row and column of the grid todetermine points at which the class changes.

For example, if the values at the grid points are:

� ( d. diff �- matrix( c( c( 1, 1, 1, 1, 1, 2, 2, 2, 2, 2), c( 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),

� c( 1, 1, 1, 2, 2, 2, 2, 2, 2, 2), c( 1, 1, 1, 2, 2, 3, 2, 2, 2, 2),

� c( 1, 1, 2, 2, 2, 3, 3, 2, 2, 2), c( 1, 1, 2, 2, 2, 3, 3, 3, 2, 2),

� c( 1, 2, 2, 2, 2, 3, 3, 3, 3, 2), c( 1, 2, 2, 2, 3, 3, 3, 3, 3, 2),

� c( 2, 2, 2, 2, 3, 3, 3, 3, 3, 2), c( 2, 2, 2, 2, 3, 3, 3, 3, 3, 3)), 10, 10, byrow�T))[, 1] [, 2] [, 3] [, 4] [, 5] [, 6] [, 7] [, 8] [, 9] [, 10]

[ 1,] 1 1 1 1 1 2 2 2 2 2[ 2,] 1 1 1 1 2 2 2 2 2 2[ 3,] 1 1 1 2 2 2 2 2 2 2[ 4,] 1 1 1 2 2 3 2 2 2 2[ 5,] 1 1 2 2 2 3 3 2 2 2[ 6,] 1 1 2 2 2 3 3 3 2 2[ 7,] 1 2 2 2 2 3 3 3 3 2[ 8,] 1 2 2 2 3 3 3 3 3 2[ 9,] 2 2 2 2 3 3 3 3 3 2

[ 10,] 2 2 2 2 3 3 3 3 3 3

We can now applydiff to the rows and columns and plot the class change positions:

� plot( expand. grid( 1: 10, 1: 10), col�d. diff[, 10: 1] �1, pch�16)

� ( change. a �- apply( d. diff, 2, diff))[, 1] [, 2] [, 3] [, 4] [, 5] [, 6] [, 7] [, 8] [, 9] [, 10]

[ 1,] 0 0 0 0 1 0 0 0 0 0[ 2,] 0 0 0 1 0 0 0 0 0 0[ 3,] 0 0 0 0 0 1 0 0 0 0[ 4,] 0 0 1 0 0 0 1 0 0 0[ 5,] 0 0 0 0 0 0 0 1 0 0[ 6,] 0 1 0 0 0 0 0 0 1 0[ 7,] 0 0 0 0 1 0 0 0 0 0[ 8,] 1 0 0 0 0 0 0 0 0 0[ 9,] 0 0 0 0 0 0 0 0 0 1

� points( expand. grid( 1: 9 �0. 5, 1: 10), col�abs( change. a[, 10: 1]), pch�3)

� ( change. b �- t( apply( d. diff, 1, diff)))[, 1] [, 2] [, 3] [, 4] [, 5] [, 6] [, 7] [, 8] [, 9]

[ 1,] 0 0 0 0 1 0 0 0 0[ 2,] 0 0 0 1 0 0 0 0 0[ 3,] 0 0 1 0 0 0 0 0 0[ 4,] 0 0 1 0 1 - 1 0 0 0[ 5,] 0 1 0 0 1 0 - 1 0 0[ 6,] 0 1 0 0 1 0 0 - 1 0[ 7,] 1 0 0 0 1 0 0 0 - 1[ 8,] 1 0 0 1 0 0 0 0 - 1[ 9,] 0 0 0 1 0 0 0 0 - 1

[ 10,] 0 0 0 1 0 0 0 0 0

� points( expand. grid( 1: 10, 1: 9�0. 5), col�abs(( change. b)[, 9: 1]), pch�3)

Logistic/kNN/LDA/QDA - 42 ©Mills 2013

Page 43: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Figure 47.

� disp. 3d �- function( model, data, model. type, steps�30) {

� # Get a bounding interval for data[, 3]

� # This will be taken as our third dimension

� z. min �- floor( min( data[, 3]))

� z. max �- ceiling( max( data[, 3]))

� # Break the interval into a number of subintervals

� z. int �- seq( z. min, z. max, by�(( z. max- z. min)/ steps))

� # For the other two variables we create a grid

� res �- f. create. grid( data. frame( data[, 1], data[, 2]), c( steps, steps))

� xp �- res$xp

� yp �- res$yp

� gp �- res$zp # The grid points

� # In order to display the separators we look for

� # changes in predicted class

� x. change �- {}

� y. change �- {}

� z. change �- {}

� for ( i in 1: steps){

� # We will look at a grids on a sequence of planes

� # in the d. flea[, 4] direction.

� data. 3d �- cbind( gp, z. int[ i])

� # The predict method likes the names to be the same

� colnames( data. 3d) �- colnames( data)

� # We use the model to predict the class of every point

� # on the grid on the current data[, 3] plane

©Mills 2013 Logistic/kNN/LDA/QDA - 43

Page 44: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

� if ( model. type �� ” lda”) {

� Z �- unclass( predict( model, data. 3d)$ class)

� } else if ( model. type �� ” qda”) {

� Z �- unclass( predict( model, data. 3d)$ class)

� } else if ( model. type �� ” net”) {

� Z �- max. col( predict( model, data. 3d))

� }

� # Put the result into a matrix

� Z. m �- matrix( Z, nrow�steps�1)

� # Find the positions in a row where the class changes

� Z. m. diff. r �- t( apply( Z. m, 1, diff))

� cc �- which( Z. m. diff. r ! � 0, arr. ind�T)

� # Find the positions in a column where the class changes

� Z. m. diff. c �- apply( Z. m, 2, diff)

� rr �- which( Z. m. diff. c ! � 0, arr. ind�T)

� # Convert the matrix positions to grid positions

� x. change �- c( x. change, xp[ rr[, 1]], xp[ cc[, 1]])

� y. change �- c( y. change, yp[ rr[, 2]], yp[ cc[, 2]])

� z. change �- c( z. change, rep( z. int[ i], length( rr[, 1]) �length( cc[, 1])))

� }

� # After we have found the change points on all the grids

� data. frame( x. change, y. change, z. change)

� }

Set up the data (LDAandQDAlike to have the same column names in the data for prediction as in the dataused to build the model):

� data. for. 3d �- data. frame( d. flea[, 6], d. flea[, 1], d. flea[, 4])

� colnames( data. for. 3d) �- c(” aede3”,” tars1”,” aede1”)

We do an LDA on the three dimensional data and display it:

� ( flea. lda �- lda( data. for. 3d, species- 1))

� # Get the change points

� data. 3d �- disp. 3d( flea. lda, data. for. 3d, ” lda”, steps�60)

� #

� library( rgl)

� plot3d( d. flea[, c( 6, 1, 4)], type�” s”, col � species�1, size�0. 3)

� points3d( data. 3d, size�2)

Logistic/kNN/LDA/QDA - 44 ©Mills 2013

Page 45: Section 8 Classification Supervised Learningmathstat.carleton.ca/~smills/2013-14/DataMining/Pdf... · Data Mining 2013 Section 8 Classification Supervised Learning Classification

Data Mining 2013

Figure 48. LDA in 3-D for flea beetles Figure 49. LDA in 3-D for flea beetles

and then do a QDA on the three dimensional data and display it:

� ( flea. qda �- qda( data. for. 3d, species- 1))

� data. 3d �- disp. 3d( flea. qda, data. for. 3d, ” qda”, steps�60)

� plot3d( d. flea[, c( 6, 1, 4)], type�” s”, col � species�1, size�0. 3)

� points3d( data. 3d, size�3)

Figure 50. QDA in 3-D for flea beetles Figure 51. QDA in 3-D for flea beetles

©Mills 2013 Logistic/kNN/LDA/QDA - 45