1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.

1 1 Slide

Slide

Evaluation

2 2 Slide

Slide

Interactive decision tree construction

• Load segmentchallenge.arff; look at dataset

• Select UserClassifier (tree classifier)

• Use the test set segmenttest.arff

• Examine data visualizer and tree visualizer

• Plot regioncentroidrow vs intensitymean

• Rectangle, Polygon and Polyline selection tools

… several selections …

• Right click in Tree visualizer and Accept the tree

Over to you: how well can you do?

Be a classifier!

3 3 Slide

Slide

Build a tree: what strategy did you use?

Given enough time, you could produce a “perfect”

tree for the dataset

• but would it perform well on the test test?

Be a classifier!

4 4 Slide

Slide

Testdata

Trainingdata

MLalgorithm

Classifier Deploy!

Evaluationresults

Training and Testing

5 5 Slide

Slide

Testdata

Trainingdata

MLalgorithm

Classifier Deploy!

Evaluationresults

sets produced byBasic assumption: training and testindependent sampling from an infinite population


6 6 Slide

Slide

Use J48 to analyze the segment dataset

• Open file segment‐challenge.arff

• Choose J48 decision tree learner (trees>J48)

• Supplied test set segment‐test.arff

• Run it: 96% accuracy

• Evaluate on training set: 99% accuracy

• Evaluate on percentage split: 95% accuracy

• Do it again: get exactly the same result!


7 7 Slide

Slide

Basic assumption:

• training and test sets sampled independently

from an infinite population

Just one dataset? — hold some out for testing

Expect slight variation in results… but Weka

produces same results each time…Why?

• E.g. J48 on segment‐challenge dataset


8 8 Slide

Slide

Evaluate J48 on segment‐challenge

• With segment‐challenge and J48 (trees>J48)

• Set percentage split to 90%

• Run it: 96.7% accuracy

• [More options] Repeat

with a different ith seed

• Use 2, 3, 4, 5, 6, 7, 8, 9, 10

Repeated Training and Testing

0.967

0.9400.9400.9670.9530.9670.9200.947

0.9330.947

9 9 Slide

Slide

0.967

0.9400.9400.9670.9530.9670.9200.9470.9330.947

x iSample mean x =n

(xi – x )2Variance 2 =

n – 1

Standard deviation

x = 0.949, = 0.0158


Evaluate J48 on segment‐challenge

10 10 Slide

Slide

Basic assumption:

• training and test sets sampled independently

from an infinite population

Expect slight variation in results … get it by

setting the random‐number seed

Can calculate mean and standard deviation

experimentally


11 11 Slide

Slide

Use diabetes dataset and default holdout Open file diabetes.arff Test option: Percentage split Try these classifiers:

• trees > J48 76%• bayes > NaiveBayes 77%• lazy > IBk 73%• rules > PART 74%

768 instances (500 negative, 268 positive) Always guess “negative”: 500/768=65%

• rules > ZeroR: most likely class!

Baseline Accuracy

12 12 Slide

Slide

Sometimes baseline is best!• Open supermarket.arff and blindly apply

• rules > ZeroR 64%• trees > J48 63%• bayes > NaiveBayes 63%• lazy > IBk 38%• rules > PART 63%

• Attributes are not informative

• Caution: Don’t just apply Weka to a dataset:

you need to understand what’s going on

Baseline Accuracy

13 13 Slide

Slide

Consider whether differences are significant

Always try a simple baseline, e.g. rules > ZeroR

Caution: Don’t just apply Weka to a dataset: you

need to understand what’s going on

Baseline Accuracy

14 14 Slide

Slide

Can we improve upon repeated holdout (i.e.

reduce variance)?

Cross‐validation

Stratified cross‐validation

Cross-Validation

15 15 Slide

Slide

Repeated holdouthold out 10% for testing, repeat 10 times

(repeat 10 times)

Cross-Validation

16 16 Slide

Slide

10‐fold cross‐validation

Divide dataset into 10 parts

Hold out each part in turnAverage the results

(folds)

Each data point used once for testing, 9 times for training

Stratified cross‐validation

Ensure that each fold has the rightproportion of each class value

Cross-Validation

17 17 Slide

Slide

Cross‐validation better than repeated holdout

Stratified is even better

Practical rule of thumb:Lots of data? – use percentage splitElse stratified 10‐fold cross‐validation

Cross-Validation

18 18 Slide

Slide

Is cross‐validation really better than repeated holdout?

Diabetes dataset

Baseline accuracy (rules > ZeroR):

trees > J4810‐fold cross‐validation

65.1%

73.8%

… with1

73.8

different random number seed2

75.0

3

75.5

4

75.5

5

74.4

6

75.6

7

73.6

8

74.0

9

74.5

10

73.0

Cross-Validation Results

19 19 Slide

Slide

holdout(10%)

75.377.980.574.071.470.179.271.480.567.5

cross‐validation(10‐fold)

73.875.075.575.574.475.673.674.074.573.0

xi Sample mean x =n

(xi – x )2Variance 2 =

n – 1

Standard deviation

x = 74.5x = 74.8 = = 4.6 0.9


20 20 Slide

Slide

Why 10‐fold? E.g. 20‐fold: 75.1%

Cross‐validation really is better than repeated holdout

It reduces the variance of the estimate


21 21 Slide

Slide

Evaluation MethodsExercises

22 22 Slide

Slide

Plan

To evaluate the performance of machine learning algorithms classifying Tic-Tac-Toe games.

23 23 Slide

Slide

Classification on Tic-Tac-Toe

Download Tic-Tac-Toe dataset tic-tac-toe.zip from Course Page.

Work as a team to evaluate the performance of machine learning algorithms classifying Tic-Tac-Toe games.

24 24 Slide

Slide

Evaluation Methods

Using Training Set (use 100% of instances to train/learn and use 100% of instances to test performance)

10-fold Cross-Validation

Split 70% (use 70% of instances to train/learn and use the rest of 30% of instances to test performance)

25 25 Slide

Slide

Classifiers Being Used Decision Tree

• Tree → J48 Neural Network

• Functions → MultilayerPerceptron (trainingtime=50)

Bayes Network• Bayes → NaiveBayes

Nearest Neighbor• Lazy → IBk (k=3)

26 26 Slide

Slide

Using Weka

Extract Tic-Tac-Toe.zip to the Weka folder Load Weka program Open the Tic-Tac-Toe.arff Choose Explorer

27 27 Slide

Slide

Using Weka (cont.)

Click Classify tab Choose J48 Classifier below trees Set the Test options to Use training set Enable Output predictions in More options Click Start to run

28 28 Slide

Slide

Using Weka (cont.)

Accuracy rate

29 29 Slide

Slide

Reporting Download Tic-tac-toe-report.docx Complete the table evaluating the performance of

different learning methods in Q1. Find the best performer in Q2, Q3, and Q4.

1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.

Documents

repeated training

test test

training set

perfect tree

test sets

baseline accuracy

slideuse j48

testingevaluate j48