Top Banner
SOMDEEP SEN; Business Analyst: Trimax Analytics (e) [email protected] ; (p): 09748229123 LinkedIn: http://linkd.in/1ifqs3x
22

Multiple regression to findout drivers of online satisfaction

Jan 27, 2015

Download

Data & Analytics

Somdeep Sen

In 2007, the State Service Commission (SSC) of New-zealand commissioned a survey
The survey was known as Driver's survey. The purpose was to determine the key drivers that influence satisfaction with service delivery
SSC identified that more State Services are becoming available online. So, the research was done to spot factors driving satisfaction for services delivered online
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Multiple regression to findout drivers of  online satisfaction

SOMDEEP SEN; Business Analyst: Trimax Analytics

(e) [email protected]; (p): 09748229123

LinkedIn: http://linkd.in/1ifqs3x

Page 2: Multiple regression to findout drivers of  online satisfaction

Plot

In 2007, the State Service Commission (SSC) New Zealand commissioned a survey

The survey was known as Driver's survey

Purpose: To the determine the key drivers that influence satisfaction with service delivery

SSC identified that more State Services are becoming available online

So, the research was done to spot factors driving satisfaction for services delivered online

Objective

To identify the key drivers of satisfaction of online services for obtaining information

Page 3: Multiple regression to findout drivers of  online satisfaction

Sample Size: 1243; total sample size has been considered for the analysis

Sampling error: 2.78%

Demographic info: Age, Gender, Income (Household income has been taken)

10 point rating scale has been used to get the response for satisfaction

10 point scale is an interval scale

But it can be treated as continuous

This is helpful in conducting the regression analysis

Respondent demographic data: Categorical in nature

Page 4: Multiple regression to findout drivers of  online satisfaction

51.5% of the respondents were female

Mean Age: 47; Mean Household income : 285000.50

Only 0.3% of the respondents reported no income

.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

1.3

5.0

9.3

6.5

12.1

13.2

8.5

12.1

10.9

6.8

3.74.7

5.8%

Age

4.1

8.68.1

8.4

18.3

21.1

23.4

7.7

.0

5.0

10.0

15.0

20.0

25.0

%

Income

Page 5: Multiple regression to findout drivers of  online satisfaction

The respondents were asked about 22 public service websites

18% of the respondents had visited at least one website

Top 3 mostly visited websites:

Inland Revenue (61.3%), Local council (39.3%), Local library (38.1%)

Top 3 most recently visited website:

Inland Revenue (22%), Local Library (15.8%) & Local council (9.2%)

Interestingly similar patterns have emerged in mostly visited & most frequently visited sites

Page 6: Multiple regression to findout drivers of  online satisfaction

A very small % of the people (0.3%) were unsure about the purpose of the visit

31.9%

67.8%

For Work For Presonal

Page 7: Multiple regression to findout drivers of  online satisfaction

The respondents were asked to rate the overall experience on a ten point rating scale

79.5% of the respondents showed more than average satisfaction level(>5)

50.2% respondents showed very satisfaction level (>=8)

.0

5.0

10.0

15.0

20.0

25.0

0 1 2 3 4 5 6 7 8 9 10

.6 .82.0

3.2 4.2

9.711.1

18.2

22.3

12.5

15.4

%

Ratings

Page 8: Multiple regression to findout drivers of  online satisfaction

P value(0.893)>α(0.05)

Overall experience is not dependent upon age group

Page 9: Multiple regression to findout drivers of  online satisfaction

P value(0.06)>α(0.05)

Overall experience is not dependent upon household income

Page 10: Multiple regression to findout drivers of  online satisfaction

Dependent Variable

Overall satisfaction of service delivery while obtaining information (q12a_1)

Justification:

10 point rating scale; can be treated as continuous

Talks about ‘overall’ satisfaction

Provides better correlation with independent variables as compared to 7a & 12_a_2

Independent variables:

q8a_1 – q8a_1 to q8a_4 - q8a_4 (Total no: 33)

Justification:

10 point rating scale; can be treated as continuous

Statements are randomized for statistical accuracy

Only statements specifically tested in the cognitive tests remain next to each other

Respondents can tell the difference between these statements

Page 11: Multiple regression to findout drivers of  online satisfaction

Treatment

10 point rating scale was used to get the responses for the predictor & the predicted

But there was also an option of 11 and(or) 12

11 and (or) 12 stood for Unsure/Not applicable

Hence those responses were replaced by mode of the responses

Justification:

Dropping the responses having 11 & 12 would reduce the sample size

Considering 11 and(or) 12 may disrupt the continuous nature of the scale

Replacing with zero may lead to biasedness as it would mean very dissatisfied &

Replacing with mean is difficult as mean of the responses may not be a whole number

Hence mode seems to be appropriate choice as it would be a whole number

Replacing with mode may minimize the biasedness as it is a response of the majority

Page 12: Multiple regression to findout drivers of  online satisfaction

Before conducting the bi-variate analysis certain variables were dropped

As per the questionnaire independent variables- 11& 12; 23, 24 & 25 were to be kept together as findings

Among these variable 11 & 25 were retained as they had stronger correlation with the dependent variable

Total number of independent variables left: 30Note:•I feel confident that my privacy was fully protected on this website (11)•I feel confident that my information was not shared with anyone else (12)•The tone of the language on the site was appropriate (23)•The language on this site was easy to understand (24)•The information on the website was easy to understand (25)

Page 13: Multiple regression to findout drivers of  online satisfaction

The bi-variate analysis

Correlations (and p-values) between the predicted and each predictor

Correlations (and p-values) of predictors with each other

Findings of the bi-variate analysis:

Each of the predictor had significant relationship with the predicted (p value<α for each)

Note: Please refer to the excel sheet Cor_1 to view the detail results

Page 14: Multiple regression to findout drivers of  online satisfaction
Page 15: Multiple regression to findout drivers of  online satisfaction

Missing value Treatment

‘Proc means’ in SAS was used to check the missing values

Neither the predicted nor the predictors had any missing values

Outlier Treatment

‘Proc univariate’ in SAS was used to check the outliers

The variables didn’t have any outliers

None of the variables had any values less than zero or more than 10

Page 16: Multiple regression to findout drivers of  online satisfaction

Independent variables were removed one as per the following steps:

a. Checking for the variable with highest VIF(Variation Inflation Factor), Xi

b. Finding the highest value of Xi in the Co-linearity Diagnostic Table

c. Finding the corresponding highest value in that row, to give us the most co-linear

variable with Xi , say Xj

d. Comparing the p-values of Xi and Xj to remove the variable with higher p-value

e. The R-sq value for the model was checked after removal of each variable

Page 17: Multiple regression to findout drivers of  online satisfaction

R-sq value was obtained

when there were 10 in-

dependant variables in

total

Some of the p-

values at this state

were still quite high.

Thus they were

removed from our

model

Page 18: Multiple regression to findout drivers of  online satisfaction

Finally, we

arrived at a

model with

R-sq of

0.653 with 6

independent

variables, b

ut the R-sq

value was

low

Page 19: Multiple regression to findout drivers of  online satisfaction

The

previous

regression

model was

re-run by

taking

q12a_1

and

removing

q8a_4_27

which

showed a

better R-sq

Page 20: Multiple regression to findout drivers of  online satisfaction

Regression Equation:

q12_a_1 = 0.053 + 0.223*q8a_1_2 + 0.221*q8a_2_13 + 0.264*q8a_2_16 + 0.143*q8a_3_20 +0.200 * q8a_4_32

Where,

q12_a_1 -: Overall satisfaction of service delivery while obtaining information

Intercept :- = 0.053

q8a_1_2 -: It was easy to find what I was looking for

q8a_2_13 -: I was able to do everything I needed to do online

q8a_2_16 -: Amount of time it took to get the overall service was acceptable

q8a_3_20 -: The information needed on the site was up-to-date

q8a_4_32 -: It’s an example of good value for tax dollars spent

Page 21: Multiple regression to findout drivers of  online satisfaction

Refining the model

Preparation of Dashboard

Preparation of the story board

Drawing Conclusion

Making Recommendations

Page 22: Multiple regression to findout drivers of  online satisfaction