Top Banner
Predicting Customer Conversion with Random Forests Daniel Gerlanc, Principal Enplus Advisors, Inc. www.enplusadvisors.com [email protected] A Decision Trees Case Study
22

Predicting Customer Conversion with Random Forests

May 24, 2015

Download

Documents

Talk given for New England Artificial Intelligence on October 10, 2012.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Predicting Customer Conversion with Random Forests

Predicting Customer Conversion with Random Forests

Daniel Gerlanc, PrincipalEnplus Advisors, [email protected]

A Decision Trees Case Study

Page 2: Predicting Customer Conversion with Random Forests

Topics

Objectives Research Question

DataBank Prospect

Conversion

MethodsDecision Trees

Random Forests

Results

Page 3: Predicting Customer Conversion with Random Forests

Objective

•Which customer or prospects should you call today?

•To whom should you offer incentives?

Page 4: Predicting Customer Conversion with Random Forests

Dataset

•Direct Marketing campaign for bank loans

•http://archive.ics.uci.edu/ml/datasets/Bank+Marketing

•45211 records, 17 features

Page 5: Predicting Customer Conversion with Random Forests

Dataset

Page 6: Predicting Customer Conversion with Random Forests

Decision Trees

Page 7: Predicting Customer Conversion with Random Forests

Decision Trees

Coat

Sunny

yes

no

Windy

No Coat

Coat

Page 8: Predicting Customer Conversion with Random Forests

Statistical Decision Trees

•Randomness

•May not know the relationships ahead of time

Page 9: Predicting Customer Conversion with Random Forests

Decision Trees

Page 10: Predicting Customer Conversion with Random Forests

Splitting

Deterministic process

Page 11: Predicting Customer Conversion with Random Forests

Decision Tree Codetree.1 <- rpart(takes.loan ~ ., data=bank)

• See the ‘rpart’ and ‘rpart.plot’ R packages.• Many parameters available to control the fit.

Page 12: Predicting Customer Conversion with Random Forests

Make Predictionspredict(tree.1, type=“vector”)

Page 13: Predicting Customer Conversion with Random Forests

How’d it do?

Actual

Predicted no yes

no (1) 38,904(2) 1,018

(3) 3,444(4) 1,845yes

Naïve Accuracy: 11.7%

Decision Tree Precision: 34.8%

Page 14: Predicting Customer Conversion with Random Forests

Decision Tree Problems

•Overfitting the data (high variance)

•May not use all relevant features

Page 15: Predicting Customer Conversion with Random Forests

Random Forests

One Decision Tree

Many Decision Trees (Ensemble)

Page 16: Predicting Customer Conversion with Random Forests

Building RF

•Sample from the data

•At each split, sample from the available variables

•Repeat for each tree

Page 17: Predicting Customer Conversion with Random Forests

Motivations for RF

•Create uncorrelated trees

•Variance reduction

•Subspace exploration

Page 18: Predicting Customer Conversion with Random Forests

Random Forestsrffit.1 <- randomForest(takes.loan ~ ., data=bank)

Most important parameters are:

Variable

Description Default

ntree Number of Trees 500

mtry Number of variables to randomly select at each node

• square root of # predictors for classification

• # predictors / 3 for regression

Page 19: Predicting Customer Conversion with Random Forests

How’d it do?

Naïve Accuracy: 11.7%

Random Forest • Precision: 64.5% (2541 / 3937)• Recall: 48% (2541 / 5289)

Actual

Predicted yes no

yes (1)2,541 (3) 2748

no (2) 1,396 (4) 38,526

Page 20: Predicting Customer Conversion with Random Forests

Tuning RF

rffit.1 <- tuneRF(X, y, mtryStart=1, stepFactor=2,improve=0.05)

Page 21: Predicting Customer Conversion with Random Forests

Benefits of RF

•Good accuracy with default settings

•Relatively easy to make parallel

•Many implementations

•R, Weka, RapidMiner, Mahout

Page 22: Predicting Customer Conversion with Random Forests

References

• A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18--22.

• Breiman, Leo. Classification and Regression Trees. Belmont, Calif: Wadsworth International Group, 1984. Print.

• Brieman, Leo and Adele Cutler. Random forests. http://www.stat.berkeley.edu/~breiman/RandomForests/cc_contact.htm

• S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM'2011, pp. 117-121, Guimarães, Portugal, October, 2011. EUROSIS.