Top Banner
Decision Tree Learning CSE 6003 Machine Learning and Reasoning
39

Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

May 22, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Tree Learning

CSE 6003 – Machine Learning and Reasoning

Page 2: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Outline

◘ What is Decision Tree Learning?

◘ What is Decision Tree?

◘ Decision Tree Examples

◘ Decision Trees to Rules

◘ Decision Tree Construction

◘ Decision Tree Algorithms

◘ Decision Tree Overfitting

Page 3: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Paradigms of Machine Learning

Machine

Learning

Neural Network

Genetic Algorithms

Decision Trees

Bayesian Learning

Decision Tree technique is one of the machine learning techniques

Page 4: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Learning Types

Learning

Supervised Learning Unsupervised Learning

Classification

Regression

Clustering

Association AnalysisDecision Tree Learning

Bayesian Learning

Nearest Neighbour

Neural Networks

Support Vector Machines

Sequence Analysis

Summerization

Descriptive Statistics

Outlier Analysis

Scoring

Decision Tree Learning is in the supervised learning type.

Page 5: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Tree Learning

◘ Decision Tree Learning is a method for approximating discrete-

valued target functions, in which the learned function is represented

by a decision tree.

◘ Decision Tree Learning is robust to noisy data and capable of

learning disjunctive expressions.

◘ One of the most widely used method for inductive inference.

Salary < 1 M

Job = teacher

Good

Age < 30

BadBad Good

House Hiring

Page 6: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Tree Representation

◘ Decision Trees classify instances by sorting them down the tree from

the root to some leaf node, which provides the classification of the

instance.

◘ Each node in the tree specifies a test of some attribute of the instance

◘ Each branch descending from that node corresponds to one of the

possible values for this attributes

Page 7: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Trees

◘ Decision Tree is a tree where

– internal nodes are simple decision rules on one or more attributes

– each branch corresponds to an attribute value

– leaf nodes are predicted class labels

◘ Decision trees are used for deciding between several courses of action

age income student credit_rating buys_computer

<=30 high no fair no

<=30 high no excellent no

31…40 high no fair yes

>40 medium no fair yes

>40 low yes fair yes

>40 low yes excellent no

31…40 low yes excellent yes

<=30 medium no fair no

<=30 low yes fair yes

>40 medium yes fair yes

<=30 medium yes excellent yes

31…40 medium no excellent yes

31…40 high yes fair yes

>40 medium no excellent no

age?

student? credit rating?

<=30 >40

no yes yes

yes

31..40

FairExcellentYesNo

Attribute

Value

Classification

Page 8: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Desicion Tree Applications

class1

class1class2

class3class5

class3class1

class4

◘ Has been used for

1. Classification

2. Data Reduction

◘ Initial attribute set: {A1, A2, A3, A4, A5, A6}

◘ Reduced attribute set: {A1, A4, A6}

A4 ?

A1? A6?

Class 1 Class 2 Class 1 Class 2

Page 9: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Tree Example

◘ A credit card company receives thousands of applications for new cards. Each application contains information about an applicant,

– age

– marital status

– annual salary

– outstanding debts

– credit rating

– etc.

◘ Problem: to decide whether an application should approved, or to classify applications into two categories, approved and not approved.

Page 10: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Tree Example (Cont)

Approved or not

Page 11: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Tree Example (Cont)

Decision nodes and leaf nodes (classes)

Page 12: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Tree Example (Cont)

◘ Construct a classification model from the data

◘ Use the model to classify future loan applications into

– Yes (approved) and

– No (not approved)

◘ What is the class for following case/instance?

Page 13: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Use the Decision Tree (Cont)

No

Once the tree is trained, then a new instance is classified by starting at the root and

following the path as dictated by the test results for this instance.

Page 14: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Tree Example

◘ Problem: decide whether to wait for a table at a restaurant

◘ Attributes:

1. Alternate: is there an alternative restaurant nearby?

2. Bar: is there a comfortable bar area to wait in?

3. Fri/Sat: is today Friday or Saturday?

4. Hungry: are we hungry?

5. Patrons: number of people in the restaurant (None, Some, Full)

6. Price: price range ($, $$, $$$)

7. Raining: is it raining outside?

8. Reservation: have we made a reservation?

9. Type: kind of restaurant (French, Italian, Thai, Burger)

10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)

Page 15: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Tree Example (Cont.)

◘ Classification of examples is positive (T) or negative (F)

Page 16: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Tree Example (Cont.)

◘ Here is the “true” tree for deciding whether to wait

Page 17: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Trees to Rules

Page 18: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Trees to Rules

◘ It is easy to derive a rule set from a decision tree

◘ Write a rule for each path in the decision tree from the root to a leaf.

◘ Can be represented as if-then rules

Example:

IF (Outlook=Sunny) (Humidity=High)

THEN PlayTennis = No

Page 19: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Trees to Rules

Page 20: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Trees Construction

Page 21: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Tree

◘ Each node tests some attribute of the instance

◘ Instances are represented by attribute-value pairs

◘ High information gain attributes close to the root

◘ Root: best attribute for classification

Which attribute is the best classifier?

answer based on information gain

Page 22: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Entropy

◘ Entropy specifies the minimum number of bits of information

needed to encode the classification of an arbitrary member of S

◘ In general:

◘ Example for two class labels

m

1ii2i plogp)S(Entropy

222121 plogpplogp)S(Entropy

Page 23: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Entropy

Page 24: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Information Gain

◘ Measures the expected reduction in entropy given the value of some

attribute A

Values(A): Set of all possible values for attribute A

Si: Subset of S for which attribute A has value v

)S(Entropy|S|

|S|)S(Entropy)A,S(Gain i

i

Ai

Page 25: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Tree Example

Which attribute first?

Page 26: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Tree Example (Cont.)

Page 27: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Tree Example (Cont.)

940,0)14/5(log)14/5()14/9(log)14/9()( 22 SEntropi

)S(Entropy|S|

|S|)S(Entropy

|S|

|S|)S(Entropy)Wind,S(Gain Strong

Strong

WeakWeak

048,0

0,1*14

6811,0*

14

8940,0

)S(Entropy|S|

|S|)S(Entropy

|S|

|S|)S(Entropy)Huminity,S(Gain Normal

NormalHigh

High

151,0

0,1*14

7985,0*

14

7940,0

Gain(S, Outlook) = 0,246

Gain(S, Temperature) = 0,029

Gain(S, Huminity) = 0,151

Gain(S, Wind) = 0,048

Page 28: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Tree Example (Cont.)

Page 29: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Tree Construction

◘ Which attribute is next?

Outlook

SunnyOvercast Rain

? Yes?

019,0970,0918,0)5/3(0,1)5/2(970,0)Wind,S(Gain Sunny

970,00,0)5/2(0,0)5/3(970,0)Huminity,S(Gain Sunny

570,00)5/1(1)5/2(0)5/2(970,0)eTemperatur,S(Gain Sunny

Page 30: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Tree Example (Cont.)

[D3,D7,D12,D13]

[D9,D11] [D4,D5,D10][D1,D2, D8] [D6,D14]

Page 31: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Another Example

At the weekend:

- go shopping,

- watch a movie,

- play tennis or

- just stay in.

What you do depends on three things:

- the weather (windy, rainy or sunny);

- how much money you have (rich or poor)

- whether your parents are visiting.

Page 32: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Another Example (Cont.)

Page 33: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

height hair eyes class

short blond blue +

tall blond brown -

tall red blue +

short dark blue -

tall dark blue -

tall blond blue +

tall dark brown -

short blond brown -

I(3+, 5-) = -3/8log23/8 – 5/8log25/8 = 0.954434003

Height: short (1+, 2-) tall(2+, 3-)

Gain(height) = 0.954434003 - 3/8*I(1+,2-) - 5/8*I(2+,3-) =

= 0.954434003 – 3/8(-1/3log21/3 - 2/3log22/3) – 5/8(-2/5log22/5 - 3/5log23/5) = 0.003228944

Hair: blond(2+, 2-) red(1+, 0-) dark(0+, 3-)

Gain(hair) = 0.954434003 – 4/8(-2/4log22/4 – 2/4log22/4) – 1/8(-1/1log21/1-0) –

-3/8(0-3/3log23/3) = 0.954434003 – 0.5 = 0.454434003

Eyes: blue(3+, 2-) brown(0+, 3-)

Gain(eyes) = 0.954434003 – 5/8(-3/5log23/5 – 2/5log22/5) -5/8(=

= 0.954434003 - 0.606844122 = 0.347589881

“Hair” is the best attribute.

Another Example

Page 34: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

34

height hair eyes class

short blond blue +

tall blond brown -

tall red blue +

short dark blue -

tall dark blue -

tall blond blue +

tall dark brown -

short blond brown - hair

dark red blond

short, dark, blue: -tall, dark, blue: -tall, bark, brown: -

tall, red, blue: + short, blond, blue: +tall, blond, brown: -tall, blond, blue: +short, blond, brown: -

Another Example (Cont.)

Page 35: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Trees Algorithms

Page 36: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Tree Algorithms

◘ ID3

– Quinlan (1981)

– Tries to reduce expected number of comparison

◘ C 4.5

– Quinlan (1993)

– It is an extension of ID3

– Just starting to be used in data mining applications

– Also used for rule induction

◘ CART

– Breiman, Friedman, Olshen, and Stone (1984)

– Classification and Regression Trees

◘ CHAID

– Kass (1980)

– Oldest decision tree algorithm

– Well established in database marketing industry

◘ QUEST

– Loh and Shih (1997)

Page 37: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Frequency Usage

Page 38: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Complexity of Tree Induction

◘ Assume

– m attributes

– n training instances

– tree depth O (log n)

◘ Building a tree O (m n log n)

◘ Total cost: O (m n log n)

Page 39: Decision Tree Learning - WordPress.com...2016/10/03  · Decision Tree Learning Decision Tree Learning is a method for approximating discrete- valued target functions, in which the

Decision Tree Adv. DisAdv.

Positives (+)

+ Reasonable training time

+ Fast application

+ Easy to interpret

+ Rule extraction from trees

(can be re-represented as if-then-else

rules)

+ Easy to implement

+ Can handle large number of features

+ Does not require any prior knowledge

of data distribution

Negatives (-)

- Cannot handle complicated

relationship between features

- Problems with lots of missing data

- Output attribute must be categorical

- Limited to one output attribute

- Difficulties involving in design an

optimal decision tree

- Overlap especially when the number of

classes is large