Multiple regression to findout drivers of online satisfaction

Post on 27-Jan-2015

107 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

In 2007, the State Service Commission (SSC) of New-zealand commissioned a survey The survey was known as Driver's survey. The purpose was to determine the key drivers that influence satisfaction with service delivery SSC identified that more State Services are becoming available online. So, the research was done to spot factors driving satisfaction for services delivered online

Transcript

SOMDEEP SEN; Business Analyst: Trimax Analytics

(e) somdeepenggmba@gmail.com; (p): 09748229123

LinkedIn: http://linkd.in/1ifqs3x

Plot

In 2007, the State Service Commission (SSC) New Zealand commissioned a survey

The survey was known as Driver's survey

Purpose: To the determine the key drivers that influence satisfaction with service delivery

SSC identified that more State Services are becoming available online

So, the research was done to spot factors driving satisfaction for services delivered online

Objective

To identify the key drivers of satisfaction of online services for obtaining information

Sample Size: 1243; total sample size has been considered for the analysis

Sampling error: 2.78%

Demographic info: Age, Gender, Income (Household income has been taken)

10 point rating scale has been used to get the response for satisfaction

10 point scale is an interval scale

But it can be treated as continuous

This is helpful in conducting the regression analysis

Respondent demographic data: Categorical in nature

51.5% of the respondents were female

Mean Age: 47; Mean Household income : 285000.50

Only 0.3% of the respondents reported no income

.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

1.3

5.0

9.3

6.5

12.1

13.2

8.5

12.1

10.9

6.8

3.74.7

5.8%

Age

4.1

8.68.1

8.4

18.3

21.1

23.4

7.7

.0

5.0

10.0

15.0

20.0

25.0

%

Income

The respondents were asked about 22 public service websites

18% of the respondents had visited at least one website

Top 3 mostly visited websites:

Inland Revenue (61.3%), Local council (39.3%), Local library (38.1%)

Top 3 most recently visited website:

Inland Revenue (22%), Local Library (15.8%) & Local council (9.2%)

Interestingly similar patterns have emerged in mostly visited & most frequently visited sites

A very small % of the people (0.3%) were unsure about the purpose of the visit

31.9%

67.8%

For Work For Presonal

The respondents were asked to rate the overall experience on a ten point rating scale

79.5% of the respondents showed more than average satisfaction level(>5)

50.2% respondents showed very satisfaction level (>=8)

.0

5.0

10.0

15.0

20.0

25.0

0 1 2 3 4 5 6 7 8 9 10

.6 .82.0

3.2 4.2

9.711.1

18.2

22.3

12.5

15.4

%

Ratings

P value(0.893)>α(0.05)

Overall experience is not dependent upon age group

P value(0.06)>α(0.05)

Overall experience is not dependent upon household income

Dependent Variable

Overall satisfaction of service delivery while obtaining information (q12a_1)

Justification:

10 point rating scale; can be treated as continuous

Talks about ‘overall’ satisfaction

Provides better correlation with independent variables as compared to 7a & 12_a_2

Independent variables:

q8a_1 – q8a_1 to q8a_4 - q8a_4 (Total no: 33)

Justification:

10 point rating scale; can be treated as continuous

Statements are randomized for statistical accuracy

Only statements specifically tested in the cognitive tests remain next to each other

Respondents can tell the difference between these statements

Treatment

10 point rating scale was used to get the responses for the predictor & the predicted

But there was also an option of 11 and(or) 12

11 and (or) 12 stood for Unsure/Not applicable

Hence those responses were replaced by mode of the responses

Justification:

Dropping the responses having 11 & 12 would reduce the sample size

Considering 11 and(or) 12 may disrupt the continuous nature of the scale

Replacing with zero may lead to biasedness as it would mean very dissatisfied &

Replacing with mean is difficult as mean of the responses may not be a whole number

Hence mode seems to be appropriate choice as it would be a whole number

Replacing with mode may minimize the biasedness as it is a response of the majority

Before conducting the bi-variate analysis certain variables were dropped

As per the questionnaire independent variables- 11& 12; 23, 24 & 25 were to be kept together as findings

Among these variable 11 & 25 were retained as they had stronger correlation with the dependent variable

Total number of independent variables left: 30Note:•I feel confident that my privacy was fully protected on this website (11)•I feel confident that my information was not shared with anyone else (12)•The tone of the language on the site was appropriate (23)•The language on this site was easy to understand (24)•The information on the website was easy to understand (25)

The bi-variate analysis

Correlations (and p-values) between the predicted and each predictor

Correlations (and p-values) of predictors with each other

Findings of the bi-variate analysis:

Each of the predictor had significant relationship with the predicted (p value<α for each)

Note: Please refer to the excel sheet Cor_1 to view the detail results

Missing value Treatment

‘Proc means’ in SAS was used to check the missing values

Neither the predicted nor the predictors had any missing values

Outlier Treatment

‘Proc univariate’ in SAS was used to check the outliers

The variables didn’t have any outliers

None of the variables had any values less than zero or more than 10

Independent variables were removed one as per the following steps:

a. Checking for the variable with highest VIF(Variation Inflation Factor), Xi

b. Finding the highest value of Xi in the Co-linearity Diagnostic Table

c. Finding the corresponding highest value in that row, to give us the most co-linear

variable with Xi , say Xj

d. Comparing the p-values of Xi and Xj to remove the variable with higher p-value

e. The R-sq value for the model was checked after removal of each variable

R-sq value was obtained

when there were 10 in-

dependant variables in

total

Some of the p-

values at this state

were still quite high.

Thus they were

removed from our

model

Finally, we

arrived at a

model with

R-sq of

0.653 with 6

independent

variables, b

ut the R-sq

value was

low

The

previous

regression

model was

re-run by

taking

q12a_1

and

removing

q8a_4_27

which

showed a

better R-sq

Regression Equation:

q12_a_1 = 0.053 + 0.223*q8a_1_2 + 0.221*q8a_2_13 + 0.264*q8a_2_16 + 0.143*q8a_3_20 +0.200 * q8a_4_32

Where,

q12_a_1 -: Overall satisfaction of service delivery while obtaining information

Intercept :- = 0.053

q8a_1_2 -: It was easy to find what I was looking for

q8a_2_13 -: I was able to do everything I needed to do online

q8a_2_16 -: Amount of time it took to get the overall service was acceptable

q8a_3_20 -: The information needed on the site was up-to-date

q8a_4_32 -: It’s an example of good value for tax dollars spent

Refining the model

Preparation of Dashboard

Preparation of the story board

Drawing Conclusion

Making Recommendations

top related