FINAL CASE STUDY 12/08/2016 Pothireddy Marreddy Mobicom is concerned that the market environment of rising churn rates and declining ARPU will hit them even harder as churn rate at Mobicom is relatively high. Currently they have been focusing on retaining their customers on a reactive basis when the subscriber calls in to close the account Objective: What are the top five factors driving likelihood of churn at Mobicom Roll out targeted proactive retention programs, which include usage enhancing marketing programs to increase minutes of usage (MOU), rate plan migration, and a bundling strategy among others. Key Attributes The Internet and recommendation of family and friends. falling ARPU Usage based promotions to increase minutes of usage (MOU) for both voice and data Bundling Optimal rate plan Artificial churn/spinners or serial churners Top Line Questions of Interest to Senior Management: 1. What are the top five factors driving likelihood of churn at Mobicom? 2. Validation of survey findings. a) Whether “cost and billing” and “network and service quality” are important factors influencing churn behavior. b) Are data usage connectivity issues turning out to be costly? In other words, is it leading to churn?
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
FINAL CASE STUDY12/08/2016
Pothireddy Marreddy
Mobicom is concerned that the market environment of rising churn rates and declining ARPU will hit them even harder as churn rate at Mobicom is relatively high. Currently they have been focusing on retaining their customers on a reactive basis when the subscriber calls in to close the account
Objective: What are the top five factors driving likelihood of churn at Mobicom Roll out targeted proactive retention programs, which include usage enhancing marketing
programs to increase minutes of usage (MOU), rate plan migration, and a bundling strategy among others.
Key Attributes
The Internet and recommendation of family and friends.
falling ARPU
Usage based promotions to increase minutes of usage (MOU) for both voice and data
Bundling
Optimal rate plan
Artificial churn/spinners or serial churners
Top Line Questions of Interest to Senior Management:
1. What are the top five factors driving likelihood of churn at Mobicom?
2. Validation of survey findings. a) Whether “cost and billing” and “network and service quality”
are important factors influencing churn behavior. b) Are data usage connectivity issues turning
out to be costly? In other words, is it leading to churn?
3. Would you recommend rate plan migration as a proactive retention strategy?
4. What would be your recommendation on how to use this churn model for prioritization of
customers for a proactive retention campaigns in the future?
/* create a permanent library name*/
Libname final "Y:\Programes\SAS Graded Assignment\Final Case Study";run;
/*Import the dataset*/
Proc import datafile = "Z:\Assignments\Graded Assignment\Topic 13 - Final Case Study Implementation/telecomfinal.csv"DBMS = CSV OUT = final.telecom replace;DATAROW = 2;GUESSINGROWS = 2000;GETNAMES = YES;Mixed = yes;SCANTEXT = Yes;RUN;
DATA EXPLORATION:
/*understand the data what is contained*/Proc contents data = final.telecom;run;
Data set "FINAL.telecom" has 66297 observation(s) and 79 variable(s) -Among 45 variables are character in nature and 34 are numeric variables-Need to convert the character variables to into numeric variables by creating dummy variables and buckets/bins for the range variables
/*check if there are any missing values and statistical analysis*/proc means data = final.telecom N NMISS MEAN STD MIN MAX MODE MEDIAN;run;
/*check how many are dafaulters*/proc freq data = final.telecom;table churn;run;
The data contains There are 76% (50438) of non-defaulters and 24% (15859)of defaulters
DATA PREPARATION:
/*there are few missing values and can be dropped or done a missing value treatment*/
DATA final.telecom1;SET final.telecom;if income NE 'NA';inc = income*1;RUN;
PROC MEANS MEAN DATA = final.telecom1;CLASS CRCLSCOD;VAR inc ;OUTPUT OUT = final.PREP (drop = _type_ _freq_ _stat_ );RUN ;
PROC SORT DATA = final.telecom;BY CRCLSCOD;RUN;
DATA final.telecom2;MERGE final.telecom (in = a ) final.prep (in = b );BY crclscod;IF A and B ;RUN;
DATA final.telecom2;SET final.telecom2;IF income = 'NA' THEN income = inc ;new_income = income * 1 ;RUN;
/*converting character variables into numeric variables*/
DATA final.telecom2;SET final.telecom2;
/* create income bucket based on quartile values */
IF new_income LE 4.9 THEN income_bkt = 1;ELSE IF new_income LE 6 THEN income_bkt = 2;ELSE IF new_income LE 7 THEN income_bkt = 3;ELSE income_bkt = 4 ;
/* create bucket for callwait based on quartile values */
IF callwait_mean LE 0 then callwait_bkt = 1 ;ELSE IF callwait_mean LE 0.334 THEN callwait_bkt = 2;ELSE IF callwait_mean LE 1.67 THEN callwait_bkt = 3;ELSE callwait_bkt = 4;
/* create an indicator variable for roaming */
IF ROAM_MEAN > 0 THEN roam_ind = 1 ;ELSE roam_ind = 0 ;
/* create drop or block calls buckets using quartile values */
IF DROP_BLK_MEAN LE 1.67 THEN drop_blk_bkt = 1 ;ELSE IF DROP_BLK_MEAN LE 5.34 THEN drop_blk_bkt = 2;ELSE IF DROP_BLK_MEAN LE 12.67 THEN drop_blk_bkt = 3;ELSE drop_blk_bkt = 4;
/* create buckets for placed voie calls using quartile values */
IF PLCD_VCE_MEAN LE 40.67 THEN PLCD_VCE_bkt = 1 ;
ELSE IF PLCD_VCE_MEAN LE 103.67 THEN PLCD_VCE_bkt = 2;ELSE IF PLCD_VCE_MEAN LE 202.67 THEN PLCD_VCE_bkt = 3;ELSE PLCD_VCE_bkt = 4;
/* create charactor variables for area of customer */
IF ETHNIC = 'O' THEN isasian = 1 ;ELSE IF ETHNIC = 'H' THEN isHisp = 1;ELSE IF ETHNIC = 'G' THEN isgerman = 1;ELSE IF ETHNIC = 'F' THEN isfrench = 1;ELSE IF ETHNIC = 'Z' THEN isafro = 1;
/* create buckets for age of equipment based on quartile values */
IF eqpdays LE 202 THEN eqp_age = 1;ELSE IF eqpdays LE 326 THEN eqp_age = 2;ELSE IF eqpdays LE 512 then EQP_AGE = 3 ;ELSE eqp_age = 4;
/* create buckets for length of relationship based on quartile values */
IF months LE 11 THEN rship_age = 1;ELSE IF months LE 16 THEN rship_age = 2;ELSE IF months LE 24 then rship_AGE = 3 ;ELSE rship_age = 4;
/* create buckets for average minutes of use based on quartile values */
IF avgmou LE 176.67 THEN avgmou_bkt = 1 ;ELSE IF avgmou LE 362.5 THEN avgmou_bkt = 2;ELSE IF avgmou LE 660.9 THEN avgmou_bkt = 3;ELSE avgmou_bkt = 4;
/* create buckets for total calls based on quartile values */
IF TOTCALLS LE 860 THEN totcalls_bkt = 1 ;ELSE IF avgmou LE 1796 THEN totcalls_bkt = 2;ELSE IF avgmou LE 3508 THEN totcalls_bkt = 3;ELSE totcalls_bkt = 4;
/* create buckets for total revenue based on quartile values */
IF TOTREV LE 860 THEN totrev_bkt = 1 ;ELSE IF TOTREV LE 1796 THEN totrev_bkt = 2;ELSE IF TOTREV LE 3508 THEN totrev_bkt = 3;ELSE totrev_bkt = 4;
RUN;
Proc means data = final.telecom2 N NMISS;run;
SPLITTING THE DATA
/**splitting the data into 2 types validation dataset and training dataset*/
Data Set FINAL.telecom2Response Variable churnNumber of Response Levels 2Model binary logitOptimisation Technique Fisher's scoring
Model: This is Binary logit regression model that was fit to the data
Optimization Technique: Fishers’ scoring is the iterative method of estimating the regression parameters
Number of Observations Read 66285
Number of Observations Used 64123
The number of observations are used less than the number of observations read since there are missing values for variables used
Response ProfileOrdered Value churn Total
Frequency1 1 153172 0 48806
Probability modeled is churn='1'.
Ordered value: A descending order high to low is treated such that when the logit regression coefficients corresponds to a positive relationship for high write status, and a negative coefficients has a negative relationship with high write status . The total frequency of probability of churn is 15317 (24%) and non-probability of churn is 48806 (76%)
Model Convergence StatusConvergence criterion (GCONV=1e-008) satisfied.
The default criterion is used to assess is the relative gradient convergence criterion (GCONV) and the default precision is 10^-8. In this model the convergence criterion satisfied
Model Fit StatisticsCriterion Intercept only Intercept and CovariatesAIC 70508.162 68071.173SC 70517.230 68325.093-2 Log L 70506.162 68015.173
AIC: Akaike Information Criterion is used for the comparison of non nested models on the same sample. The model with low AIC will be the best.
SC: if The Schwarz Criterion is small and that is most desirable
AIC and SC penalize the log- likelihood by the number of predictors in the model
The -2 Log L is used in hypothesis tests for nested models and this is negative two times the log likelihood
Intercept and Covariates: A fitted model includes all independent variables and the intercept. This can be compared the values in this column with the criteria corresponding intercept only value to assess modelfit/significance
Testing Global Null Hypothesis: BETA=0Test Chi-Square DF Pr > Chi-SquareLikelihood Ratio 2490.9885 27 <.0001
Score 2453.9078 27 <.0001Wald 2316.9133 27 <.0001
The Global null hypothesis test against null hypothesis that at least one of the predictor’s regression coefficient is not equal to zero
Likelihood Ratio, score and wald test that at least one of the predictor’s regression coefficient is not equal to zero
Calculations: - 2 Log L (Null model which is intercept only) – 2 Log L (fitted model which is intercept and covariates) which 2490.9885
PR>Chi-Square: it is compared to specified alpha level and accept the a type 1 error. The small p value lead to conclude that at least one of the regression coefficients in the model is not equal to zero
Parameter – the predictor variables in the model and intercept
DF: Degrees of Freedom corresponding to the parameter. Each parameter estimated in the model requires one DF(which is estimated in this model as one) and defines the Chi-square distribution to test whether the individual regression coefficient is zero, given the other variables are in the model.
Estimate: These are the binary logit regression estimates for the parameters in the model. The logistic regression model models the log odds of a positive response (churn probability = 1) as a linear combination the predictor variables.
Interpretation: For a unit change in the predictor variable, the difference in log-odds for a positive outcome is expected to change by the respective coefficient given the other variables in the model are held constant.
Intercept = -1.1379 (logistic regression estimate when all variables in the model are evaluated at zero.
Standard error – these are the standard errors of the individual regression coefficients.
Pr>ChiSq : Testing the null hypothesis that an individual predictor’s regression coefficient is zero,given the other predictor variables are in the model. The chi square test statistic is the squared ration of the estimate to the standard error of the respective predictor. In this model all variables are significant at 0.05 %
Points Effect: The odds ratio is obtained by exponentiation the estimate. These difference in the log of two odds is equal to the log of the ration of these two odds. The log of the ration of the two odds is the log odds ratio.
Interpretation: For a one unit change in the predictor variable, the odds ratio for a positive outcome is expected to change by the respective coefficient, given the other variables in the model are held constant.
95% Wald Confidence Limits: This is the Wald Confidence Interval (CI) of an individual odds ratio, given the other predictor’s variables are in the model. For a given predictor variable with a level of 95% confidence, that we are 95% confident that upon repeated trials, 95% of the CI’s include the true population odds ratio. If the CI includes one we would fail to reject the null hypothesis that a particular regression coefficient equals to zero and the odds ration equals one, given the other predictors are in the model.
Association of Predicted Probabilities and Observed Responses
Percent Concordant 63.0 Somer's D 0.267Percent Discordant 36.3 Gamma 0.268Percent Tied 0.7 Tau-a 0.097Pairs 7.4756E8 c 0.633
Percent Concordant: A pair of observations with different observed responses is said to be concordant if the observation with the lower ordered response value “0” has a lower predicted mean score than the observation with the higher ordered response value “1”. The concordant percent is 63 %. The higher the
concordant the better the model is. This looks a good model since the concordant value is higher than the discordant percent.
Percent Discordant: If the observation with the lower ordered response value has a higher predicted mean score than the observation with the higher ordered response value. The discordant percent is 36.3%
Percent Tied: It is ties since a pair of observations is neither concordant nor discordant.
Pairs: The total number of distinct pairs in which one case has an observed outcome different from the other member of the pair.
In the response profile table we have 15317 observations with honcomp = 1 and 48806 observations with honcomp = 0; This the total number of pairs with different coutcomes is 15317*48806 = 747,561,502 (7.4756E8)
Somer’s D: Is used to determine the strength and direction of relation between pairs of variables values ranges from -1 (all pairs disagree) to 1 (all pairs agree). It is defined as (concordant – discordant)/number of total number of pairs with different response
Somer’s D = (63-36.3)/100 = 0.267
Gamma : The Good man – Kruskal Gamma method does not penalize for ties on either variable. It value ranges from -1(no association) to 1 (perfect association). It generally is greater than Somer’s D
Tau-a: Kendall’s Tau-a is a modification of Somer’s D that takes into account the difference between the number of possible paired observations and the number of paired observations with a different response. It is defined to be the ratio of the difference between the number of concordant pairs and the number of discordant pairs to the number of possible pairs. Usually Tau-a is much smaller than Somer’s D since there would be many paired observations with the same response.
C-C: This is equivalent to the well known measure ROC. C ranges from 0.5 to 1 where 0.5 corresponds to the model randomly predicting the response, and a 1 corresponds to the model perfectly discriminating the response.
Partition for the Hosmer and Lemeshow TestGroup Total Observed
Hosmer and Lemeshow Goodness-of-Fit TestChi-Square DF Pr > Chi-Square
15.6395 8 0.0478
1. Hosmer and Lemeshow Goodness-of-Fit Test –
Pr > Chi-square value is 0.0478 which is higher than that of the training dataset and validation dataset respectively 0.0430 and 0.0428 as well as lower than 0.05 indicating a good model.
2. Association of Predicted Probabilities and Observed Responses
Percent Concordant is 63 (62.8 of training dataset) and percent discordant is 36.3 (36.5 of the training dataset). This is an acceptable score. There is an improvement over the training dataset
/*Ranking the data*/PROC RANK DATA = DMP OUT = final.DECILE GROUPS = 10 TIES = MEAN ;VAR p_1;RANKS decile;RUN;
PROC SORT DATA = final.DECILE ;
BY DESCENDING p_1 ;RUN;
EXPORT THE DATA
PROC EXPORT DATA = final.DECILEOUTFILE = 'Y:\Programes\SAS Graded Assignment\Final Case Study\Gain.xlsx'DBMS = EXCEL;RUN;
Accuracy
/*check accuracy**/
Data final.testacc;set DMP;if f_churn = 0 and i_churn = 0 then out = "True Negative";else if f_churn = 1 and i_churn = 1 then out = "True Positive";else if f_churn = 0 and i_churn = 1 then out = "False Positive";else if f_churn = 1 and i_churn = 0 then out = "False Negative";run;
Proc freq;tables out;run;
Cut off probability = 0.5
out Frequency
Percent Cumulative
Frequency
Cumulative
Percent
Predicted Actual ExpectedFalse Negativ 15055 23.48 15055 23.48
P_Churn = 0 and A_Churn = 0 then True Negative%yes captured correctly 0.46% P_Churn = 1 and A_Churn = 1 then True Positive%of actual nos missclassfied as yes 76% P_Churn = 0 and A_Churn = 1 then False Positive%of Nos captured correctly 0.41% P_Churn = 1 and A_Churn = 0 then False Negative% of actual yes missclassified as Nos = 24%
From the gain chart, the model appears to be a better predictor than a standard average probability calculation.
At 50% probability level, the model has correctly predicted 260 churn of 15057 actual, which is only a 1.7% success rate. But it has correctly predicted 48513 out of 48775 non churn, which is 99.5% success rate.
At 50% probability level, the model has correctly predicted 295 churn of 15057 actual, which is very healthy a 1.9% success rate. It has correctly predicted 48513 out of 48808 non churn, which is 99.4% success rate. This probability level can be chosen as an appropriate level for prediction.