Top Banner
www.edureka.co/r-for-analyti Decoding the Science of Decision Trees! Learn from Experts
37
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Decoding the science of Decision trees

www.edureka.co/r-for-analytics

Decoding the Science of Decision Trees! Learn from Experts

Page 2: Decoding the science of Decision trees

Slide 2Slide 2Slide 2 www.edureka.co/r-for-analytics

Today we will take you through the following:

The Classic Banking Challenge !!.. Have you already guessed it??

The Available Options for Solution

Why Decision Tree?

How Decision Tree Methodology Works ?

Agenda

Page 3: Decoding the science of Decision trees

Slide 3Slide 3Slide 3 www.edureka.co/r-for-analytics

The Classic Situation…

Page 4: Decoding the science of Decision trees

Slide 4Slide 4Slide 4 www.edureka.co/r-for-analytics

A bank wants to classify its future customers into two categories “Risky” and “Good” based on customer’s available attributes.

Let’s say a customer xyz has the following attributes. How will the bank know to which category this customer belong.

Undergrad Marital Status Taxable Income City

Population Work Experience

(Yrs)Urban Category

No Married 98,727 1,01,894 14 NO ????

The Problem?

Page 5: Decoding the science of Decision trees

Slide 5Slide 5Slide 5 www.edureka.co/r-for-analytics

# A manager has to decide whether he should hire more

human resources or not in order to optimize the work

load balance

# An individual has to make a decision such as whether

or not to undertake a capital project, or must chose

between two competing ventures

Let See Few More Cases..

Page 6: Decoding the science of Decision trees

Slide 6Slide 6Slide 6 www.edureka.co/r-for-analytics

The Available Solution Options…

Page 7: Decoding the science of Decision trees

Slide 7Slide 7Slide 7 www.edureka.co/r-for-analytics

Algorithms that can help..

Such type of problems comes under

“classification”

It is the separation or ordering of objects into

classes

There are few techniques in classification method,

like:

Decision Tree

Naïve Bayes

k-Nearest Neighbor

Support Vector Machine etc..

Page 8: Decoding the science of Decision trees

Slide 8Slide 8Slide 8 www.edureka.co/r-for-analytics

Why Decision Tree is Favorable..?

DT NB KNN SVM

Simple visual representation of a decision situation YES NO NO NO

Easy to interpret and explain to executives (Non-programmers)! YES NO NO NO

Illustrates a variety of decisions and also the impact of each decision if different decisions were to be taken

YES NO NO NO

Allow us to predict, explain, describe, or classify an outcome altogether YES NO NO NO

Help determine worst, best and expected values for different scenarios YES NO NO NO

Able to handle both numerical and categorical data YES NO NO NO

Advantages of Decision Tree Methodology

Decision Tree (DT)Naïve Bayes (NB)k-Nearest Neighbor (KNN)Support Vector Machine (SVM)

Page 9: Decoding the science of Decision trees

Slide 9Slide 9Slide 9 www.edureka.co/r-for-analytics

Decision Trees are

"white boxes" : The acquired knowledge can be expressed in a readable form,

while KNN,SVM,NB are

“black boxes”, :You cannot read the acquired knowledge in a comprehensible

way

e.g. To Explain a suitable Weather Condition for Playing in Decision Tree format..

If weather is nice and wind is normal and the day is sunny then only play ( *Readable

Format)

Decision Tree Advantages..Easy to interpret and explain to executives (Non-programmers)!

Cond. 1 Cond. 2 Cond. 3

Page 10: Decoding the science of Decision trees

Slide 10Slide 10Slide 10 www.edureka.co/r-for-analytics

If stays for 6 months with the company

If doesn’t stay for 6 months with the company

If stays for 6 months with the company

If doesn’t stay for 6 months with the company

With 50 % success, $100

With 50 % fail, -$40

With 50 % success, $90

With 50 % fail, -$20

50%($100) – 50% ($40) = $30

50%($90) – 50% ($20) = $35

Permanent

Outsourced

Manager

Illustrates a variety of decisions and also the impact of each decision if different decisions were to be taken

Decision Tree Advantages..contd

Page 11: Decoding the science of Decision trees

Slide 11Slide 11Slide 11 www.edureka.co/r-for-analytics

Let’s Understand it More

… What is Decision

Tree?

Page 12: Decoding the science of Decision trees

Slide 12Slide 12Slide 12 www.edureka.co/r-for-analytics

Decision Tree

Decision Tree is a supervised rule based

classification

* * During tree construction, attribute selection measures are used to select the attribute which best partitions the tuples into distinct classes

Classification rule

Flowchart - like tree structure

The topmost node in a tree is the root node

Each internal node denotes a test on an attribute, e.g. whether a coin flip comes up heads or tails

Each branch represents an outcome of the

test

Each leaf node holds a class label (decision taken after computing all attributes)

Paths from root to leaf represents classification rules

Page 13: Decoding the science of Decision trees

Slide 13Slide 13Slide 13 www.edureka.co/r-for-analytics

Under-

grad

Marital

Status

Taxable

Income

CityPopulatio

n

WorkExperienc

e

Urban Category

Yes Married 98,727 1,01,894 14 NO Risky

No Single 44,000 10,18,945 12 YES Good

No Divorced

50,000 10,15,845 14 YES Good

No Single 32,100 12,58,945 12 NO Risky

Yes Married 28,000 1,22,945 8 YES Risky

No Single 35,100 12,56,845 10 NO Good

No Divorced

38,100 18,95,945 7 NO Good

Under-

grad

MaritalStatus

Taxable

Income

CityPopulatio

n

WorkExperienc

e

Urban

Category

Yes Divorced

98,727 1,01,894 14 NO ????

DT Can Be used With Machine LearningWhen Coupled with Machine Learning, Decision Tree can be used for

Prediction

Page 14: Decoding the science of Decision trees

Slide 14Slide 14Slide 14 www.edureka.co/r-for-analytics

How it Works..

Let’s Build a Decision Tree Model !

Page 15: Decoding the science of Decision trees

Slide 15Slide 15Slide 15 www.edureka.co/r-for-analytics

Train Model (Build Tree)

Undergrad

4 Good / 3 Risky

No Yes

MaritalStatus

TaxableIncome

CityPopulation

WorkExperience

Urban

category

Single 44,000 1,01,8945

12 YES Good

Divorced

50,000 1,01,5845

14 YES Good

Single 32,100 1,25,8945

12 NO Risky

Single 35,100 1,25,6845

10 NO Good

Divorced

38,100 1,89,5945

7 NO Good

MaritalStatus

TaxableIncome

CityPopulation

WorkExperience

Urban

category

Married

98,727 1,01,894 14 NO Risky

Married

28,000 1,22,945 8 YES Risky

Page 16: Decoding the science of Decision trees

Slide 16Slide 16Slide 16 www.edureka.co/r-for-analytics

Train Model (Build Tree)

Undergrad

4 Good / 3 Risky

No Yes

MaritalStatus

TaxableIncome

CityPopulation

WorkExperience

Urban

category

Single 44,000 1,01,8945

12 YES Good

Divorced

50,000 1,01,5845

14 YES Good

Single 32,100 1,25,8945

12 NO Risky

Single 35,100 1,25,6845

10 NO Good

Divorced

38,100 1,89,5945

7 NO Good

MaritalStatus

TaxableIncome

CityPopulation

WorkExperience

Urban

category

Married

98,727 1,01,894 14 NO Risky

Married

28,000 1,22,945 8 YES Risky2 RiskyPure Subset

4 Good/1 RiskySplit Further

Page 17: Decoding the science of Decision trees

Slide 17Slide 17Slide 17 www.edureka.co/r-for-analytics

Train Model (Build Tree)

Undergrad

4 Good / 1 Risky

No Yes

MaritalStatus

TaxableIncome

CityPopulation

WorkExperience

Urban category

Married 98,727 1,01,894 14 NO Risky

Married 28,000 1,22,945 8 YES Risky

MaritalStatus

TaxableIncome

CityPopulation

WorkExperience

Urban

category

50,000 1,01,5845

14 YES Good

38,100 1,89,5945

7 NO Good

Single Divorced

TaxableIncome

CityPopulation

WorkExperience

Urban category

44,000 1,01,8945

12 YES Good

32,100 1,25,8945

12 NO Risky

35,100 1,25,6845

10 NO Good

4 Good / 3 Risky

2 RiskyPure Subset

Page 18: Decoding the science of Decision trees

Slide 18Slide 18Slide 18 www.edureka.co/r-for-analytics

Train Model (Build Tree)

Undergrad

4 Good / 1 Risky

No Yes

MaritalStatus

TaxableIncome

CityPopulation

WorkExperience

Urban category

Married 98,727 1,01,894 14 NO Risky

Married 28,000 1,22,945 8 YES Risky

MaritalStatus

TaxableIncome

CityPopulation

WorkExperience

Urban

category

50,000 1,01,5845

14 YES Good

38,100 1,89,5945

7 NO Good

Single Divorced

TaxableIncome

CityPopulation

WorkExperience

Urban category

44,000 1,01,8945

12 YES Good

32,100 1,25,8945

12 NO Risky

35,100 1,25,6845

10 NO Good2 Good/1 RiskySplit Further

4 Good / 3 Risky

2 RiskyPure Subset

2 GoodPure Subset

Page 19: Decoding the science of Decision trees

Slide 19Slide 19Slide 19 www.edureka.co/r-for-analytics

Train Model (Build Tree)

MaritalStatus

TaxableIncome

CityPopulation

WorkExperience

Urban

category

50,000 1,01,5845

14 YES Good

38,100 1,89,5945

7 NO Good

Single Divorced

2 GoodPure Subset

Taxable Income

< 33000 >33000

TaxableIncome

CityPopulation

WorkExperience

Urban category

32,100 1,25,8945

12 NO Risky

TaxableIncome

CityPopulation

WorkExperience

Urban category

44,000

1,01,8945 12 YES Good

35,100

1,25,6845 10 NO Good

4 Good / 1 Risky

2 Good / 1 Risky

Page 20: Decoding the science of Decision trees

Slide 20Slide 20Slide 20 www.edureka.co/r-for-analytics

Train Model (Build Tree)

MaritalStatus

TaxableIncome

CityPopulation

WorkExperience

Urban

category

50,000 1,01,5845

14 YES Good

38,100 1,89,5945

7 NO Good

Single Divorced

2 GoodPure Subset

Taxable Income

< 33000 >33000

TaxableIncome

CityPopulation

WorkExperience

Urban category

32,100 1,25,8945

12 NO Risky

TaxableIncome

CityPopulation

WorkExperience

Urban category

44,000

1,01,8945 12 YES Good

35,100

1,25,6845 10 NO Good2 GoodPure Subset

1 RiskyPure Subset

4 Good / 1 Risky

2 Good / 1 Risky

Page 21: Decoding the science of Decision trees

Slide 21Slide 21Slide 21 www.edureka.co/r-for-analytics

Undergrad

Marital Status

Taxable Income

GoodRisky

Risky

Good

Yes No

DivorcedSingle

< 33K > 33K

The Final Built Model

Here is a trained model that will help us in future

“Classification”

Page 22: Decoding the science of Decision trees

Slide 22Slide 22Slide 22 www.edureka.co/r-for-analytics

Let’s Use our Model..

Page 23: Decoding the science of Decision trees

Slide 23Slide 23Slide 23 www.edureka.co/r-for-analytics

Test Data

Start from the root of tree

Under-grad

MaritalStatus

TaxableIncome

CityPopulation

WorkExperience

Urban

category

No Divorced 98727 101894 14 NO ????

Undergrad

Marital Status

Taxable Income

GoodRisky

Risky

Good

Yes No

DivorcedSingle

< 33K > 33K

Page 24: Decoding the science of Decision trees

Slide 24Slide 24Slide 24 www.edureka.co/r-for-analytics

Test Data

How It WorksUnder-grad

MaritalStatus

TaxableIncome

CityPopulation

WorkExperience

Urban

category

No Divorced 98727 101894 14 NO ????

Undergrad

Page 25: Decoding the science of Decision trees

Slide 25Slide 25Slide 25 www.edureka.co/r-for-analytics

Test DataNo

How It WorksUnder-grad

MaritalStatus

TaxableIncome

CityPopulation

WorkExperience

Urban

category

No Divorced 98727 101894 14 NO ????

Undergrad

Page 26: Decoding the science of Decision trees

Slide 26Slide 26Slide 26 www.edureka.co/r-for-analytics

Test DataNo

How It WorksMaritalStatus

TaxableIncome

CityPopulation

WorkExperience

Urban

category

Divorced 98727 101894 14 NO ????

Undergrad

Marital Status

Page 27: Decoding the science of Decision trees

Slide 27Slide 27Slide 27 www.edureka.co/r-for-analytics

Test Data

How It WorksMaritalStatus

TaxableIncome

CityPopulation

WorkExperience

Urban

category

Divorced 98727 101894 14 NO ????

Divorced

No

Undergrad

Marital Status

Page 28: Decoding the science of Decision trees

Slide 28Slide 28Slide 28 www.edureka.co/r-for-analytics

Test Data

How It WorksMaritalStatus

TaxableIncome

CityPopulation

WorkExperience

Urban

category

Divorced 98727 101894 14 NO Good

Divorced

Good

The customer lies in the “Good” Group and he can be considered for all policies for a good customer like more loans can be provided to him, his credit card limit can be increased etc..

No

Undergrad

Marital Status

Page 29: Decoding the science of Decision trees

Demo

Page 30: Decoding the science of Decision trees

Slide 30Slide 30Slide 30 www.edureka.co/r-for-analytics

Real Life Application

Page 31: Decoding the science of Decision trees

Slide 31Slide 31Slide 31 www.edureka.co/r-for-analytics

Predicting tumor cells as benign or malignant!

Application of Decision Trees

Page 32: Decoding the science of Decision trees

Slide 32Slide 32Slide 32 www.edureka.co/r-for-analytics

Banks Using For Classifying credit card transactions !

Application of Decision Trees

Page 33: Decoding the science of Decision trees

Slide 33Slide 33Slide 33 www.edureka.co/r-for-analytics

Categorizing news stories as finance, weather etc…

Application of Decision Trees

Page 34: Decoding the science of Decision trees

Questions

Slide 34

Page 35: Decoding the science of Decision trees

Slide 35Slide 35Slide 35 www.edureka.co/r-for-analytics

Conclusion!

When to Apply Decision Tree ??

®Whenever you are making a future complex decision

®When you have are just experimenting with the decisions and you want to

evaluate and visualize your decision and the impact

®When you want to present your decision and its comparison with other

decisions on the same problem

Page 36: Decoding the science of Decision trees

Slide 36

Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better!

Please spare few minutes to take the survey after the webinar.

Survey

Page 37: Decoding the science of Decision trees