Top Banner
Decision Trees Machine Learning CSx824/ECEx242 Bert Huang Virginia Tech
17

Decision Trees - courses.cs.vt.educourses.cs.vt.edu/cs5824/Fall15/pdfs/3-Decision Trees.pdf · Decision Trees Machine Learning ... • Recursively train decision tree for each resulting

Jun 19, 2018

Download

Documents

trinhhanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Decision Trees - courses.cs.vt.educourses.cs.vt.edu/cs5824/Fall15/pdfs/3-Decision Trees.pdf · Decision Trees Machine Learning ... • Recursively train decision tree for each resulting

Decision Trees

Machine Learning CSx824/ECEx242

Bert Huang Virginia Tech

Page 2: Decision Trees - courses.cs.vt.educourses.cs.vt.edu/cs5824/Fall15/pdfs/3-Decision Trees.pdf · Decision Trees Machine Learning ... • Recursively train decision tree for each resulting

Outline

• Learning decision trees

• Extensions: random forests

Page 3: Decision Trees - courses.cs.vt.educourses.cs.vt.edu/cs5824/Fall15/pdfs/3-Decision Trees.pdf · Decision Trees Machine Learning ... • Recursively train decision tree for each resulting

round ears

sharp claws stripes

catlong tailcatdog

catdog

Page 4: Decision Trees - courses.cs.vt.educourses.cs.vt.edu/cs5824/Fall15/pdfs/3-Decision Trees.pdf · Decision Trees Machine Learning ... • Recursively train decision tree for each resulting

round earsround ears

sharp claws stripes

catlong tailcatdog

catdog

stripes

long tail

cat

Page 5: Decision Trees - courses.cs.vt.educourses.cs.vt.edu/cs5824/Fall15/pdfs/3-Decision Trees.pdf · Decision Trees Machine Learning ... • Recursively train decision tree for each resulting

Decision Tree Learning• Greedily choose best decision rule

• Recursively train decision tree for each resulting subsetfunction fitTree(D, depth) if D is all one class or depth >= maxDepth node.prediction = most common class in D return node rule = BestDecisionRule(D) dataLeft = {(x, y) from D where rule(D) is true} dataRight = {(x, y) from D where rule(D) is false)} node.left = fitTree(D_left, depth+1) node.right = fitTree(D_right, depth+1)

function fitTree(D, depth) if D is all one class or depth >= maxDepth node.prediction = most common class in D return node rule = BestDecisionRule(D) dataLeft = {(x, y) from D where rule(D) is true} dataRight = {(x, y) from D where rule(D) is false)} node.left = fitTree(D_left, depth+1) node.right = fitTree(D_right, depth+1)

Page 6: Decision Trees - courses.cs.vt.educourses.cs.vt.edu/cs5824/Fall15/pdfs/3-Decision Trees.pdf · Decision Trees Machine Learning ... • Recursively train decision tree for each resulting

function fitTree(D, depth) if D is all one class or depth >= maxDepth node.prediction = most common class in D return node rule = BestDecisionRule(D) dataLeft = {(x, y) from D where rule(D) is true} dataRight = {(x, y) from D where rule(D) is false)} node.left = fitTree(D_left, depth+1) node.right = fitTree(D_right, depth+1)

function fitTree(D, depth) if D is all one class or depth >= maxDepth node.prediction = most common class in D return node rule = BestDecisionRule(D) dataLeft = {(x, y) from D where rule(D) is true} dataRight = {(x, y) from D where rule(D) is false)} node.left = fitTree(D_left, depth+1) node.right = fitTree(D_right, depth+1)

function fitTree(D, depth) if D is all one class or depth >= maxDepth node.prediction = most common class in D return node rule = BestDecisionRule(D) dataLeft = {(x, y) from D where rule(D) is true} dataRight = {(x, y) from D where rule(D) is false)} node.left = fitTree(D_left, depth+1) node.right = fitTree(D_right, depth+1)

Page 7: Decision Trees - courses.cs.vt.educourses.cs.vt.edu/cs5824/Fall15/pdfs/3-Decision Trees.pdf · Decision Trees Machine Learning ... • Recursively train decision tree for each resulting

Choosing Decision Rules

• Define a cost function cost(D)

• Misclassification rate

• Entropy or information gain

• Gini index

Page 8: Decision Trees - courses.cs.vt.educourses.cs.vt.edu/cs5824/Fall15/pdfs/3-Decision Trees.pdf · Decision Trees Machine Learning ... • Recursively train decision tree for each resulting

Misclassification Rate

y := argmax

c⇡c

cost(D) :=

1

|D|X

i2DI(yi 6= y) = 1� ⇡y

⇡c :=1

|D|X

i2DI(yi = c)

cost(D)�✓|DL||D| cost(DL) +

|DR ||D| cost(DR)

class proportion (estimated probability)

best prediction

error rate

cost reduction

Page 9: Decision Trees - courses.cs.vt.educourses.cs.vt.edu/cs5824/Fall15/pdfs/3-Decision Trees.pdf · Decision Trees Machine Learning ... • Recursively train decision tree for each resulting

Entropy and Information Gain⇡c :=

1

|D|X

i2DI(yi = c)

H(⇡) := �CX

c=1

⇡c log ⇡c

infoGain(j) = H(Y )� H(Y |Xj

)

= �X

y

Pr(Y = y) log Pr(Y = y)+

X

x

j

Pr(Xj

= xj

)

X

y

Pr(Y = y |Xj

= xj

) log Pr(Y = y |Xj

= xj

).

cost(D)�✓|DL||D| cost(DL) +

|DR ||D| cost(DR)

Page 10: Decision Trees - courses.cs.vt.educourses.cs.vt.edu/cs5824/Fall15/pdfs/3-Decision Trees.pdf · Decision Trees Machine Learning ... • Recursively train decision tree for each resulting

Information GaininfoGain(j) = H(Y )� H(Y |X

j

)

= �X

y

Pr(Y = y) log Pr(Y = y)+

X

x

j

Pr(Xj

= xj

)

X

y

Pr(Y = y |Xj

= xj

) log Pr(Y = y |Xj

= xj

).

Xj = Y Xj ? Y

Page 11: Decision Trees - courses.cs.vt.educourses.cs.vt.edu/cs5824/Fall15/pdfs/3-Decision Trees.pdf · Decision Trees Machine Learning ... • Recursively train decision tree for each resulting

Gini IndexCX

c=1

⇡c(1� ⇡c) =X

c

⇡c �X

c

⇡2c = 1�

X

c

⇡2c

like misclassification rate, but accounts for uncertainty

Page 12: Decision Trees - courses.cs.vt.educourses.cs.vt.edu/cs5824/Fall15/pdfs/3-Decision Trees.pdf · Decision Trees Machine Learning ... • Recursively train decision tree for each resulting

Comparing the Metrics

0 0.2 0.4 0.6 0.8 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Error rateGiniEntropy

probability of class 1

% Fig 9.3 from Hastie book

p=0:0.01:1;gini = 2*p.*(1-p);entropy = -p.*log(p) - (1-p).*log(1-p);err = 1-max(p,1-p);

% scale to pass through (0.5, 0.5)entropy = entropy./max(entropy) * 0.5;

figure;plot(p, err, 'g-', p, gini, 'b:', p, … entropy, 'r--', 'linewidth', 3);legend('Error rate', 'Gini', 'Entropy')

Page 13: Decision Trees - courses.cs.vt.educourses.cs.vt.edu/cs5824/Fall15/pdfs/3-Decision Trees.pdf · Decision Trees Machine Learning ... • Recursively train decision tree for each resulting

Overfitting

• A decision tree can achieve 100% training accuracy when each example is unique

• Limit depth of tree

• Strategy: train very deep tree

• Adaptively prune

Page 14: Decision Trees - courses.cs.vt.educourses.cs.vt.edu/cs5824/Fall15/pdfs/3-Decision Trees.pdf · Decision Trees Machine Learning ... • Recursively train decision tree for each resulting

Pruning with Validation Setround ears

sharp claws stripes

catlong tailcatdog

catdog

Validation accuracy: 0.4

Page 15: Decision Trees - courses.cs.vt.educourses.cs.vt.edu/cs5824/Fall15/pdfs/3-Decision Trees.pdf · Decision Trees Machine Learning ... • Recursively train decision tree for each resulting

Pruning with Validation Setround ears

sharp claws stripes

catcatdog dog

Validation accuracy: 0.4new validation accuracy: 0.41

Page 16: Decision Trees - courses.cs.vt.educourses.cs.vt.edu/cs5824/Fall15/pdfs/3-Decision Trees.pdf · Decision Trees Machine Learning ... • Recursively train decision tree for each resulting

Random Forests• Use bootstrap aggregation to train many

decision trees

• Randomly subsample n examples

• Train decision tree on subsample

• Use average or majority vote among learned trees as prediction

• Also randomly subsample features

• Reduces variance without changing bias

Page 17: Decision Trees - courses.cs.vt.educourses.cs.vt.edu/cs5824/Fall15/pdfs/3-Decision Trees.pdf · Decision Trees Machine Learning ... • Recursively train decision tree for each resulting

Summary• Training decision trees

• Cost functions

• Misclassification

• Entropy and information gain

• Gini index (expected error)

• Pruning

• Random forests (bagging)