Top Banner
CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One- Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages 152-156
22

CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages 152-156.

Jan 21, 2016

Download

Documents

Bonnie Howard
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

Find GCD in C

CSCI 347, Data MiningEvaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping,Sections 5.3 & 5.4, pages 152-156Training & Testing DilemmaWant a large training data setWant a large testing dataset

Often dont have enough good data

Training & TestingResubstitution error rate error rate resulting from testing on the training dataThis error rate will be highly optimisticNot a good indicator of what the performance will be on an independent test dataset

Evaluation in Weka

Over fitting - Negotiations

Over fitting - Diabetes1R with default bucket value of 6

plas:< 114.5-> tested_negative< 115.5-> tested_positive< 127.5-> tested_negative< 128.5-> tested_positive< 133.5-> tested_negative< 135.5-> tested_positive< 143.5-> tested_negative< 152.5-> tested_positive< 154.5-> tested_negative>= 154.5-> tested_positive(587/768 instances correct)

71.5% correct

Over fitting - Diabetes1R with bucket value of 20

plas:< 139.5-> tested_negative>= 139.5-> tested_positive(573/768 instances correct)

72.9% correct

Over fitting - Diabetes1R with bucket value of 50

plas:< 143.5-> tested_negative>= 143.5-> tested_positive(576/768 instances correct)

74.2% correct

Over fitting - Diabetes1R with bucket value of 200

preg:< 6.5-> tested_negative>= 6.5-> tested_positive(521/768 instances correct)

66.7% correct

HoldoutHoldout procedure hold out some data for testing

Recommendation when have enough data, holdout 1/3 of data for testing (use 2/3rd for training)

Stratified HoldoutStratified holdout check that each class is represented in approximately equal proportions in the testing dataset as it was in the overall dataset

Evaluation Techniques when dont have enough data Techniques: Cross Validation,Stratified Cross Validation, Leave-One-Out Cross Validation and Bootstrapping

Repeated Holdout MethodRepeated holdout method Use multiple iterations, in each iteration a certain proportion of the dataset is randomly selected for training (possibly with stratification). The error rates on the different iterations are averaged to yield an overall error rate

Possible ProblemThis is still not optimum, when the proportion to be held out for testing is randomly selected, the testing sets may overlap.

Cross-ValidationCross-validation decide a fixed number of folds or partitions of the dataset. For each of the n folds train with (n-1)/n of the dataset, test with 1/n of the dataset to estimate the error

Typical stages: Split the data into n subsets of equal sizeUse each subset in turn for testing, the remaining for trainingAverage the results

Stratified Cross-ValidationStratified n-folds cross validation, each split is made to have instances with the class variable represented proportionally

Recommendation When Insufficient Data10-fold cross validation with stratification has become the standard. Book states: Extensive experiments have shown that this is the best choice to get an accurate estimateThere is some theoretical evidence that this is the best choice

Controversy still rages in the machine learning community

Leave-One-Out Cross-Validation Leave-One-Out Cross-Validation - the number of folds is the same as the number of training instances

Pros:Makes the best use of the data since the greatest possible amount of data is used for trainingInvolves no random sampling Cons:Computationally expensive (increases directly as there are more instances)None of the samples will be stratified

Bootstrap MethodsBootstrap uses sampling with replacement to form the training setSample a dataset of n instances n times with replacement to form a new dataset of n instancesUse this data as the training setUse the instances from the original dataset that dont occur in the new training set for testing

0.632 bootstrap Likelihood of an element not being chosen to be in the training set? (1 1/n)Repeat this process n times likelihood of not being chosen? (1 1/n)n(1-1/2)2 = 0.25(1-1/3)3 = 0.287(1-1/4)4 = 0.316(1-1/5)5 = 0.326(1-1/6)6 = 0.335(1-1/7)7 = 0.340(1-1/8)8 = 0.343(1-1/9)9 = 0.346(1-1/10)10 = 0.347 . . .(1-1/500)500 = 0.368(1-1/n)n converges to 0.368

0.632 bootstrap So an instance for largish n (n=500) has a 0.368 likelihood of not being chosen

The instance has a 1-0.268 = 0.632 chance of being selected.

0.632 bootstrap method

0.632 bootstrap For Bootstrapping the error estimate on the test data will be very pessimistic since the training only occurred on ~63% of the instances. Therefore, combine it with weighted resubstitution error: Error = 0.632*errortest_instances + 0.368 * errortraining_instancesRepeat the process several times with different replacement samples; average the resultsBootstrapping is probably the best way of estimating performance for small datasets