Top Banner
Machine Learning Decision Trees: Representation 1 Some slides from Tom Mitchell, Dan Roth and others
38

Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Jul 19, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

MachineLearning

DecisionTrees:Representation

1SomeslidesfromTomMitchell,DanRothandothers

Page 2: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Keyissuesinmachinelearning

• ModelingHowtoformulateyourproblemasamachinelearningproblem?Howtorepresentdata?Whichalgorithmstouse?Whatlearningprotocols?

• RepresentationGoodhypothesisspacesandgoodfeatures

• Algorithms– Whatisagoodlearningalgorithm?– Whatissuccess?– Generalizationvs overfitting– Thecomputationalquestion:Howlongwilllearningtake?

2

Page 3: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Comingup…(therestofthesemester)

Differenthypothesisspacesandlearningalgorithms– DecisiontreesandtheID3algorithm– Linearclassifiers

• Perceptron• SVM• Logisticregression

– Combiningmultipleclassifiers• Boosting,bagging

– Non-linearclassifiers– Nearestneighbors

3

Page 4: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Comingup…(therestofthesemester)

Differenthypothesisspacesandlearningalgorithms– DecisiontreesandtheID3algorithm– Linearclassifiers

• Perceptron• SVM• Logisticregression

– Combiningmultipleclassifiers• Boosting,bagging

– Non-linearclassifiers– Nearestneighbors

4

Importantissuestoconsider

1. Whatdothesehypothesesrepresent?

2. Implicitassumptionsandtradeoffs

3. Generalization?

4. Howdowelearn?

Page 5: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Thislecture:LearningDecisionTrees

1. Representation:Whataredecisiontrees?

2. Algorithm:Learningdecisiontrees– TheID3algorithm:Agreedyheuristic

3. Someextensions

5

Page 6: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Thislecture:LearningDecisionTrees

1. Representation:Whataredecisiontrees?

2. Algorithm:Learningdecisiontrees– TheID3algorithm:Agreedyheuristic

3. Someextensions

6

Page 7: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Representingdata

Datacanberepresentedasabigtable,withcolumnsdenotingdifferentattributes

Name Label

ClaireCardie -PeterBartlett +EricBaum +Haym Hirsh -LesliePackKaelbling +Yoav Freund -

7

Page 8: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Representingdata

Datacanberepresentedasabigtable,withcolumnsdenotingdifferentattributes

Name Namehaspunctuation?

Secondcharacteroffirstname

Lengthoffirst

name>5?

Samefirstletterintwonames?

Label

ClaireCardie No l Yes Yes -PeterBartlett No e No No +EricBaum No r No No +Haym Hirsh No a No Yes -LesliePackKaelbling

No e Yes No +

Yoav Freund No o No No -8

Page 9: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Name Namehaspunctuation?

Secondcharacteroffirstname

Lengthoffirst

name>5?

Samefirstletterintwonames?

Label

ClaireCardie No l Yes Yes -PeterBartlett No e No No +EricBaum No r No No +Haym Hirsh No a No Yes -LesliePackKaelbling

No e Yes No +

Yoav Freund No o No No -

Representingdata

Datacanberepresentedasabigtable,withcolumnsdenotingdifferentattributes

Withthesefourattributes,howmanyuniquerowsarepossible?2· 26· 26· 2=2704

Ifthereare100attributes,allbinary,howmanyuniquerowsarepossible?2100

9

Page 10: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Name Namehaspunctuation?

Secondcharacteroffirstname

Lengthoffirst

name>5?

Samefirstletterintwonames?

Label

ClaireCardie No l Yes Yes -PeterBartlett No e No No +EricBaum No r No No +Haym Hirsh No a No Yes -LesliePackKaelbling

No e Yes No +

Yoav Freund No o No No -

Representingdata

Datacanberepresentedasabigtable,withcolumnsdenotingdifferentattributes

Withthesefourattributes,howmanyuniquerowsarepossible?2×26×2×2 = 208

Ifthereare100attributes,allbinary,howmanyuniquerowsarepossible?2100

10

Page 11: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Name Namehaspunctuation?

Secondcharacteroffirstname

Lengthoffirst

name>5?

Samefirstletterintwonames?

Label

ClaireCardie No l Yes Yes -PeterBartlett No e No No +EricBaum No r No No +Haym Hirsh No a No Yes -LesliePackKaelbling

No e Yes No +

Yoav Freund No o No No -

Representingdata

Datacanberepresentedasabigtable,withcolumnsdenotingdifferentattributes

Withthesefourattributes,howmanyuniquerowsarepossible?2×26×2×2 = 208

Ifthereare100attributes,allbinary,howmanyuniquerowsarepossible?2100

11

Page 12: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Name Namehaspunctuation?

Secondcharacteroffirstname

Lengthoffirst

name>5?

Samefirstletterintwonames?

Label

ClaireCardie No l Yes Yes -PeterBartlett No e No No +EricBaum No r No No +Haym Hirsh No a No Yes -LesliePackKaelbling

No e Yes No +

Yoav Freund No o No No -

Representingdata

Datacanberepresentedasabigtable,withcolumnsdenotingdifferentattributes

Withthesefourattributes,howmanyuniquerowsarepossible?2×26×2×2 = 208

Ifthereare100attributes,allbinary,howmanyuniquerowsarepossible?(100times)2×2×2×⋯×2 = 2)**

12

Page 13: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Name Namehaspunctuation?

Secondcharacteroffirstname

Lengthoffirst

name>5?

Samefirstletterintwonames?

Label

ClaireCardie No l Yes Yes -PeterBartlett No e No No +EricBaum No r No No +Haym Hirsh No a No Yes -LesliePackKaelbling

No e Yes No +

Yoav Freund No o No No -

Representingdata

Datacanberepresentedasabigtable,withcolumnsdenotingdifferentattributes

Withthesefourattributes,howmanyuniquerowsarepossible?2×26×2×2 = 208

Ifthereare100attributes,allbinary,howmanyuniquerowsarepossible?(100times)2×2×2×⋯×2 = 2)**

13

Ifwewantedtostoreallpossiblerows,thisnumberistoolarge.

Weneedtofigureouthowtorepresentdatainabetter,moreefficientway

Page 14: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Whataredecisiontrees?

Ahierarchicaldatastructurethatrepresentsdatausingadivide-and-conquerstrategy

Canbeusedashypothesisclassfornon-parametricclassificationorregression

Generalidea:Givenacollectionofexamples,learnadecisiontreethatrepresentsit

14

Page 15: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Whataredecisiontrees?

• Decisiontreesareafamilyofclassifiersforinstancesthatarerepresentedbycollectionsofattributes(i.e.features)

• Nodes aretestsforfeaturevalues

• Thereisonebranch foreveryvaluethatthefeaturecantake

• Leaves ofthetreespecifytheclasslabels

15

Page 16: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Let’sbuildadecisiontreeforclassifyingshapes

Label=ALabel=C Label=B

16

Page 17: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Let’sbuildadecisiontreeforclassifyingshapes

17

Beforebuildingadecisiontree:

Whatisthelabelforaredtriangle?Andwhy?

Label=ALabel=C Label=B

Page 18: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Let’sbuildadecisiontreeforclassifyingshapes

Whataresomeattributesoftheexamples?

18

Label=ALabel=C Label=B

Page 19: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Let’sbuildadecisiontreeforclassifyingshapes

Whataresomeattributesoftheexamples?Color,Shape

19

Label=ALabel=C Label=B

Page 20: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Let’sbuildadecisiontreeforclassifyingshapes

Whataresomeattributesoftheexamples?Color,Shape Color?

20

Label=ALabel=C Label=B

Page 21: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Let’sbuildadecisiontreeforclassifyingshapes

Whataresomeattributesoftheexamples?Color,Shape Color?

Blue Red Green

21

Label=ALabel=C Label=B

Page 22: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Let’sbuildadecisiontreeforclassifyingshapes

Whataresomeattributesoftheexamples?Color,Shape Color?

Blue Red Green

B

22

Label=ALabel=C Label=B

Page 23: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Let’sbuildadecisiontreeforclassifyingshapes

Whataresomeattributesoftheexamples?Color,Shape Color?

Blue Red Green

B

squaretriangle circle

CAB

Shape?

23

Label=ALabel=C Label=B

Page 24: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Let’sbuildadecisiontreeforclassifyingshapes

Whataresomeattributesoftheexamples?Color,Shape Color?

Shape?circlesquare

AB

Blue Red Green

B

squaretriangle circle

CAB

Shape?

24

Label=ALabel=C Label=B

Page 25: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Label=ALabel=C Label=B

Let’sbuildadecisiontreeforclassifyingshapes

Whataresomeattributesoftheexamples?Color,Shape Color?

Shape?circlesquare

AB

Blue Red Green

squaretriangle circle

CAB

Shape?

1. Howdowelearn adecisiontree?Comingupsoon…

2. Howtouseadecisiontreeforprediction?• Whatisthelabelforared triangle?

• Justfollowapathfromtheroottoaleaf

• Whataboutagreentriangle?

25

B

Page 26: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Label=ALabel=C Label=B

Let’sbuildadecisiontreeforclassifyingshapes

Whataresomeattributesoftheexamples?Color,Shape Color?

Shape?circlesquare

AB

Blue Red Green

squaretriangle circle

CAB

Shape?

1. Howdowelearn adecisiontree?Comingupsoon…

2. Howtouseadecisiontreeforprediction?• Whatisthelabelforared triangle?

• Justfollowapathfromtheroottoaleaf

• Whataboutagreen triangle?

26

B

Page 27: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

ExpressivityofDecisiontrees

WhatBooleanfunctionscandecisiontreesrepresent?– AnyBooleanfunction

(Color=blue ANDShape=triangle) Label=B)AND(Color=blue ANDShape=square) Label=A) AND(Color=blueANDShape=circle) Label=C)AND….

Everypathfromthetreetoarootisarule

Thefulltreeisequivalenttotheconjunctionofalltherules

AnyBooleanfunctioncanberepresentedasadecisiontree.

27

Page 28: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

ExpressivityofDecisiontrees

WhatBooleanfunctionscandecisiontreesrepresent?– AnyBooleanfunction

(Color=blue ANDShape=triangle) Label=B)AND(Color=blue ANDShape=square) Label=A) AND(Color=blueANDShape=circle) Label=C)AND….

AnyBooleanfunctioncanberepresentedasadecisiontree.

28

Everypathfromthetreetoarootisarule

Thefulltreeisequivalenttotheconjunctionofalltherules

Page 29: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

ExpressivityofDecisiontrees

WhatBooleanfunctionscandecisiontreesrepresent?– AnyBooleanfunction

(Color=blue ANDShape=triangle) Label=B)AND(Color=blue ANDShape=square) Label=A) AND(Color=blueANDShape=circle) Label=C)AND….

AnyBooleanfunctioncanberepresentedasadecisiontree.

29

Everypathfromthetreetoarootisarule

Thefulltreeisequivalenttotheconjunctionofalltherules

Page 30: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

DecisionTrees

• Outputsarediscretecategories

• Butrealvaluedoutputsarealsopossible(regressiontrees)

• Wellstudiedmethodsforhandlingnoisydata(noiseinthelabelorinthefeatures)andforhandlingmissingattributes– Pruningtreeshelpswithnoise– Moreonthislater…

30

Page 31: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Numericattributesanddecisionboundaries

• Wehaveseeninstancesrepresentedasattribute-valuepairs(color=blue,secondletter=e,etc.)– Valueshavebeencategorical

• Howdowedealwithnumericfeaturevalues?(eg length=?)– Discretizethemorusethresholdsonthenumericvalues– Thisexampledividesthefeaturespaceintoaxisparallelrectangles

31

Page 32: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Numericattributesanddecisionboundaries

• Wehaveseeninstancesrepresentedasattribute-valuepairs(color=blue,secondletter=e,etc.)– Valueshavebeencategorical

• Howdowedealwithnumericfeaturevalues?(eg length=?)– Discretizethemorusethresholdsonthenumericvalues– Thisexampledividesthefeaturespaceintoaxisparallelrectangles

32

Page 33: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Numericattributesanddecisionboundaries

• Wehaveseeninstancesrepresentedasattribute-valuepairs(color=blue,secondletter=e,etc.)– Valueshavebeencategorical

• Howdowedealwithnumericfeaturevalues?(eg length=?)– Discretizethemorusethresholdsonthenumericvalues– Thisexampledividesthefeaturespaceintoaxisparallelrectangles

13X

7

5

Y

- +

+ +

+ +

-

-

+

33

Page 34: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Numericattributesanddecisionboundaries

• Wehaveseeninstancesrepresentedasattribute-valuepairs(color=blue,secondletter=e,etc.)– Valueshavebeencategorical

• Howdowedealwithnumericfeaturevalues?(eg length=?)– Discretizethemorusethresholdsonthenumericvalues– Thisexampledividesthefeaturespaceintoaxisparallelrectangles

13X

7

5

Y

- +

+ +

+ +

-

-

+

34

X<3

Y<5

no yes

Y>7yesno

X<1

no yes

- + +

+ -yesno

Page 35: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Numericattributesanddecisionboundaries

• Wehaveseeninstancesrepresentedasattribute-valuepairs(color=blue,secondletter=e,etc.)– Valueshavebeencategorical

• Howdowedealwithnumericfeaturevalues?(eg length=?)– Discretizethemorusethresholdsonthenumericvalues– Thisexampledividesthefeaturespaceintoaxisparallelrectangles

13X

7

5

Y

- +

+ +

+ +

-

-

+Decisionboundariescanbenon-linear

35

X<3

Y<5

no yes

Y>7yesno

X<1

no yes

- + +

+ -yesno

Page 36: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Summary:Decisiontrees

• DecisiontreescanrepresentanyBooleanfunction• Awaytorepresentlotofdata• Anaturalrepresentation(think20questions)• Predicting withadecisiontreeiseasy

• Clearly,givenadataset,therearemanydecisiontreesthatcanrepresentit.[Exercise:Why?]

• Learningagoodrepresentationfromdataisthenextquestion

36

Page 37: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Summary:Decisiontrees

• DecisiontreescanrepresentanyBooleanfunction• Awaytorepresentlotofdata• Anaturalrepresentation(think20questions)• Predicting withadecisiontreeiseasy

• Clearly,givenadataset,therearemanydecisiontreesthatcanrepresentit.[Exercise:Why?]

• Learningagoodrepresentationfromdataisthenextquestion

37

Page 38: Decision Trees: Representation - svivek · Let’s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Color? Shape? square circle

Exercises

1. WritedownthedecisiontreefortheshapesdataiftherootnodewasShape insteadofColor.

2. Willthetwotreesmakethesamepredictionsforunseenshapes/colorcombinations?

3. ShowthatmultiplestructurallydifferentdecisiontreescanrepresentthesameBooleanfunctionoftwoormorevariables.

38

Label=ALabel=C Label=B

(thinkaboutwhatitmeansfortwotreestobestructurallydifferent)