Statistics and Data Analysis Part_Eight Analysis of Variance
Statistics and Data Analysis
Part_EightAnalysis of Variance
Analysis of Variance
• Analysis of variance helps compare two or more populations of quantitative data.
• Specifically, we are interested in the relationships among the population means (are they equal or not).
• The procedure works by analyzing the sample variance.
• The analysis of variance is a procedure that tests to determine whether differences exits among two or more population means.
• To do this, the technique analyzes the variance of the data.
Single - Factor (One - Way) Analysis of Variance : Independent Samples
• Example– An apple juice manufacturer is planning to develop
a new product -- a liquid concentrate.– The marketing manager has to decide how to
market the new product.– Three strategies are considered
• Emphasize convenience of using the product.
• Emphasize the quality of the product.
• Emphasize the product’s low price.
• Example - continued
– An experiment was conducted as follows:
• In three cities an advertising campaign was launched.
• In each city only one of the three characteristics
(convenience, quality, and price) was emphasized.
• The weekly sales were recorded for twenty weeks
following the beginning of the campaigns.
• We assume the samples are independent of each other
CLUSTER SAMPLING
Convnce Quality Price529 804 672658 630 531793 774 443514 717 596663 679 602719 604 502711 620 659606 697 689461 706 675529 615 512498 492 691663 719 733604 787 698495 699 776485 572 561557 523 572353 584 469557 634 581542 580 679614 624 532
Convnce Quality Price529 804 672658 630 531793 774 443514 717 596663 679 602719 604 502711 620 659606 697 689461 706 675529 615 512498 492 691663 719 733604 787 698495 699 776485 572 561557 523 572353 584 469557 634 581542 580 679614 624 532
• Example - continued– Data
Weekly sales
Weekly sales
Weekly sales
• Solution– The data are quantitative.
– Our problem objective is to compare sales in three cities.
– We hypothesize on the relationships among the three mean weekly sales:
202020N =
ADTYPE
3.002.001.00
SA
LE
S
900
800
700
600
500
400
300
Descriptives
SALES
20 577.5500 103.8027 23.2110 528.9688 626.1312 353.00 793.00
20 653.0000 85.0771 19.0238 613.1827 692.8173 492.00 804.00
20 608.6500 93.1141 20.8210 565.0713 652.2287 443.00 776.00
60 613.0667 97.8147 12.6278 587.7984 638.3349 353.00 804.00
1.00
2.00
3.00
Total
N Mean Std. Deviation Std. Error Lower Bound Upper Bound
95% Confidence Interval forMean
Minimum Maximum
ExploratoryAnalysis
• The test stems from the following rationale:– If the null hypothesis is true, we would expect
all the sample means be close to one another (and as a result to the overall mean).
– If the alternative hypothesis is true, at least 2 of the sample means would be different from one another.
H0: 1 = 2= 3
H1: At least two means differ
Ho: 1 = 2= 3
H1: At least two means differ
Test statistic F= 3.23p-val = 0.047 < 0.05
There is sufficient evidence to reject Ho in favor of H1, and argue that at least one of the mean sales is different than the others.
202020N =
ADTYPE
3.002.001.00
SA
LES
900
800
700
600
500
400
300
ANOVA
SALES
57512.233 2 28756.117 3.233 .047
506983.5 57 8894.447
564495.7 59
Between Groups
Within Groups
Total
Sum ofSquares df Mean Square F Sig.
Descriptives
SALES
20 577.5500 103.8027 23.2110 528.9688 626.1312 353.00 793.00
20 653.0000 85.0771 19.0238 613.1827 692.8173 492.00 804.00
20 608.6500 93.1141 20.8210 565.0713 652.2287 443.00 776.00
60 613.0667 97.8147 12.6278 587.7984 638.3349 353.00 804.00
1.00
2.00
3.00
Total
N Mean Std. Deviation Std. Error Lower Bound Upper Bound
95% Confidence Interval forMean
Minimum Maximum
SPSS runs the ANOVA test for us
p-value
• The Test Statistic – where does it come from??
• The test stems from the following rationale:– If the null hypothesis is true, we would expect
all the sample means be close to one another (and as a result to the grand mean).
– If all the means are equal we would estimate the sample variance by
11
)(1 1
2
2
n
SST
n
xx
s
k
j
n
iij
i
This measures the Total variation in the data
• The variability among the sample means is measured as the sum of squared distances between each group mean and the overall mean.
This sum is called the
Sum of Squares Between Groups
SSB
2
1
)( xxnSSBk
jjj
In our example groups arerepresented by the differentadvertising strategies.
If all the means are equal then this number will be smallIf differences exist among the means then this number will be large
202020N =
ADTYPE
3.002.001.00
SA
LE
S
900
800
700
600
500
400
300
•The variation in the data, IF some of the group means are different is called the
Sum of Squares Within Groups
SSW 2
1 1
)(
k
j
n
ijij
j
xx
It is possible to show that:
SST = SSB + SSW
ANOVA compares SSB to SSW. If SSB is comparatively large, then there exist differences between the group means
3
41 2
SSB is the sum of squares between the constant mean and each group meanSST is the total variation, between the individual points and the constant meanSSW is the variation when different means are allowedIf SSW is about the same as SST then the means are close to equal SSB is small
SST = SSB + SSW
The test statistic herecompares SSB and SSW
If SSB is comparativelylarge, then differences exist in the group means
Actually SSB and SSW are NOT on the same scale,they are scaled first, then compared
1k
SSBMSB
kn
SSWMSW
The test statistic for ANOVA compares MSB to MSW using
MSW
MSBF
This statistic has an F distribution with (k-1) and (n-k) degrees of freedom
If MSB is large then the F statistic is large and differences exist in the group means
)()1()1( knkn
SSWSSBSST
We can summarise this information in an ANOVA table
Source SS df MS F p-val
Between groups SSB k-1 MSB MSB/MSWWithin groups SSW n-k MSW
Total SST n-1
If the SSB is ‘large’ then the model with differing group meansis a significant improvement over the constant mean model as SSW must be ‘small’.
The p-value in this case is )Pr( ,1 FF knk
We can use SPSS to calculate this for us
Assumptions for ANOVAThe statistic will have an F-distribution only if the data in each group are normally distributed AND the variation in eachgroup is roughly the same. When using this test, if the data are highly non-normal OR havevastly different variation in each group, then the test is not valid.
Usually it is enough if the histograms are roughly mound shaped
This means the histogram is roughly symmetric with highestdensity in the middle and lowest density in the tails
And finally the hypothesis test:
H0: 1 = 2 = …=k
H1: At least two means differ
Test statistic:
Rejection region: F>Fk-1,n-k
MSEMST
F
Specifically in our advertisement problem
Ho: 1 = 2= 3
H1: At least two means differ
Test statistic F= MST / MSE= 3.23
As p = 0.047 < 0.05 there is sufficient evidence to reject Ho in favor of H1, and argue that at least one of the mean sales is different than the others.
23.317.894,812.756,28
MSEMST
F
Checking the required conditions
• The F test requires that the populations are normally distributed with equal variance.
• From SPSS we compare the sample variances: 10774, 7238, 8670. It seems the variances are roughly equal (we can test for this too if we like)
• To check the normality observe the histogram of each sample.
Convenience
0
24
68
10
450 550 650 750 850 More
Quality
02468
10
450 550 650 750 850 More
Price
02468
10
450 550 650 750 850 More
All the distributions seem to be close to normal OR at least possibly normal.
A test for equal variancesThe hypotheses to be tested here are
different is variancegroup 1least At :
...: 222
210
A
k
H
H
Levene came up with a test for this assuming normally distributed data in each group, SPSS does this test for us.
Test of Homogeneity of Variances
SALES
.344 2 57 .710
LeveneStatistic df1 df2 Sig.
The large p-value (0.71) means that the sample variances are sufficiently‘close’ to each other. We cannot conclude that they are different.
So we have found differences in the groups, but where do they
lie?
202020N =
ADTYPE
3.002.001.00
SA
LES
900
800
700
600
500
400
300
Hard to see where differences lie here, But we want to know which is the ‘best’ or ‘worst’ advertising strategy
Solution: We can calculate all 3 95% CI’s for the difference in mean sales
BUT: We need to account for the number of CI’s we calculate. Why??
A 95% CI contains the true value only 95% of the time, it is wrong 5%of the time. So for every 20 intervals we calculate we would expect tomake the WRONG decision 1 time on average.
Comparison of multiple means
Question : Before looking at the data, the manager decides to test whether emphasizing convenience is different to emphasizing quality in terms of sales.
The point estimate of the difference in sales is
12
12 xx As this is just a difference between two means we can put a 95% confidence interval around it as usual
21
2
12 2nn
sxx
This confidence interval relies on two things1. The 2 groups of sales are independent of each other2. The comparison has been decided to be made before we look at the data. WHY??
Consider the case where we have 50 groups and wish to compare among the means. We might compare among 50 school classes in terms of HSC marks. We can do 3 things
1. A planned comparison2. An unplanned comparison3. Data snoop
Planned comparisons
A planned comparison is when we wish to compare two meansbecause they answer a research question considered before the
data were inspected.
For example, in the 50 group study, Fred Smith may wish to compare class 36 (his class) to his friend Bob Jones class (class 48).
Here we do the usual confidence interval. For = 0.05 we have
)11
(23648
23648 nn
sYY
This interval has a 95% chance of containing the true mean difference
3648
Unplanned comparisons (data mining)
The researcher for the 50 class study finds 95% confidence intervals for every possible difference between two means
(that’s 2450 intervals) and reports only the ones found to be significantly different to each other. The claim is made that significant differences have been found between the classes
and the intervals are used to illustrate where the differences are.
Any problem with this?
At the 5% level, we expect he will find 0.05*2450 = 122.5
intervals NOT containing 0, even if all the means really are equal!
We must take account of this in some way.
Data Snooping
The researcher for the 50 class study picks the highest and lowest group means and finds a 95% confidence interval for the difference
between these two means and reports only this interval.
Any problem with this?
We must account for the fact that there are 2450 possible comparisons and he has picked the maximum and minimum group means. Note that again we would expect 5% (that’s 122.5) of 95% intervals not to contain 0, even if ALL the 50 class means were exactly equal to each other.
Again we must take account of his.
Comparing multiple meansUnless we do a planned comparison, we must change the level of each confidence interval we calculate for differences between 2 means, to allow for data mining or data snooping.
We want to calculate multiple confidence intervals so that there is an overall 5% chance of finding a difference (when that difference is really 0) for each sample of DATA we analyse, NOT each CI we calculate.
There are many methods to do this. All involve widening confidence intervals in some way. The best and most accurate for pairwise comparisons were discovered by John Tukey.
Tukey’s simultaneous mean confidence intervals – give an overall 95% chance of
containing all 3 true mean difference valuesMultiple Comparisons
Dependent Variable: SALES
Tukey HSD
-75.4500* 29.8236 .037 -147.2181 -3.6819
-31.1000 29.8236 .553 -102.8681 40.6681
75.4500* 29.8236 .037 3.6819 147.2181
44.3500 29.8236 .305 -27.4181 116.1181
31.1000 29.8236 .553 -40.6681 102.8681
-44.3500 29.8236 .305 -116.1181 27.4181
(J) ADTYPE2.00
3.00
1.00
3.00
1.00
2.00
(I) ADTYPE1.00
2.00
3.00
MeanDifference
(I-J) Std. Error Sig. Lower Bound Upper Bound
95% Confidence Interval
The mean difference is significant at the .05 level.*.
We are 95% confident that strategy 2 (quality) is better than strategy 1 (convenience) in terms of mean sales of apple juice.
Factor ALevel 1Level2
Level 1
Factor B
Level 3
Two - way ANOVA
Level2
One - way ANOVA
Treatment 3
Treatment 2
Response
Response
Treatment 1
What if we have morethan one factor OR not independent groups?
Treatment 4
Treatment 3
Treatment 2
Treatment 1
Block 1Block3 Block2
Block all the observations with some commonality across treatments
This is similar to matched pairs for 2 samples.
• Example – A radio station manager wants to know if the
amount of time his listeners spent listening to the radio is about the same for every day of the week.
– 200 teenagers were asked to record how long they spend listening to a radio each day of the week.
• Solution– The problem objective is to compare seven
populations (1 for each week day).– The data are quantitative.
– Each day of the week can be considered a group.
– Each 7 data points (per person) are related (called a block), because they belong to the same person.
– This procedure eliminates the variability in the “radio-times” among teenagers, and helps detect differences of the mean times teenagers listen to the radio among the days of the week only.
Ho: 1 = 2=…= 7
H1: At least two means differ
ANOVASource of Variation SS df MS F P-value F critRows 209834.6 199 1054.445 2.627722 1.04E-23 1.187531Columns 28673.73 6 4778.955 11.90936 5.14E-13 2.106162Error 479125.1 1194 401.2773
Total 717633.5 1399
ANOVASource of Variation SS df MS F P-value F critRows 209834.6 199 1054.445 2.627722 1.04E-23 1.187531Columns 28673.73 6 4778.955 11.90936 5.14E-13 2.106162Error 479125.1 1194 401.2773
Total 717633.5 1399
b-1 k-1
Conclusion: At 5% significance level there is sufficient evidence to reject the null hypothesis, and infer that mean “radio time”is different in at least one of the week days.
BlocksGroups
MSE
MSGFGroups
Two Factor Analysis of Variance
Suppose that two factors are to be examined:• The effects of the marketing approach on sales.
– Emphasis on convenience– Emphasis on quality– Emphasis on price
• The effects of the selected media on sales.– Advertise on TV– Advertise in newspapers
• We can design the experiment as follows: City 1 City2 City3 City4 City5 City6Conven. & Quality& Price & Convenience& Quality & Price &
TV TV TV paper paper paper
• This is a one - way ANOVA experimental design.
The p-value =.045. We conclude that there is a strong evidence that differences exist in the mean weekly sales at the 5% level.
Ho: 1 = 2=…= 6
H1: At least two means differ
• Are these differences caused by differences in the marketing approach?
• Are these differences caused by differences in the medium used for advertising?
• Are there combinations of these two factors that interact to affect the weekly sales?
• A new experimental design is needed to answer these questions.
City 1sales
City3sales
City 5sales
City 2sales
City 4sales
City 6sales
TV
Newspapers
Convenience Quality Price
Factor A: Marketing strategy
Factor B: Advertising media
Are there differences in the mean sales caused by different marketing strategies?
Test whether mean sales of “Convenience”, “Quality”, and “Price” significantly differ from one another.
Factor A: Marketing strategy
Factor B: Advertising media
Factor A: Marketing strategy
Factor B: Advertising media
Factor A: Marketing strategy
Factor B: Advertising media
City 1sales
City 3sales
City 5sales
City 2sales
City 4sales
City 6sales
TV
Newspapers
Convenience Quality Price
Factor A: Marketing strategy
Factor B: Advertising media
Are there differences in the mean sales caused by different advertising media?
Test whether mean sales of the “TV”, and “Newspapers” significantly differ from one another.
City 1sales
City 5sales
City 2sales
City 4sales
City 6sales
TV
Newspapers
Convenience Quality Price
Factor A: Marketing strategy
Factor B: Advertising media
Are there differences in the mean sales caused by interaction between marketing strategy and advertising medium?
Test whether mean sales of certain cells are different than the level expected.
City 3sales
101010 101010N =
ADTYPE
3.002.001.00
SA
LES
1
900
800
700
600
500
400
MEDIUM
Newspape
TV
Graphically, quality againseems the best strategy andnewspaper ads tend to getslightly more sales than TV ads.
Always plot the data first
The interaction measureswhether the effect of each advertising strategyis the same for each advertising medium (papers and TV)
Estimated Marginal Means of SALES1
ADTYPE
3.002.001.00
Est
imat
ed M
argi
nal M
eans
700
680
660
640
620
600
580
560
540
MEDIUM
Newspape
TV
The (close to) parallel lines indicate that no interaction is occurring between medium and advertising strategy.
If the differences between thelines were changing or the lines crossed over, this wouldbe some evidence of interaction
Convenience Quality Price
TV 491 677 575TV 712 627 614TV 558 590 706TV 447 632 484TV 479 683 478TV 624 760 650TV 546 690 583TV 444 548 536TV 582 579 579TV 672 644 795
Newspaper 464 689 803Newspaper 559 650 584Newspaper 759 704 525Newspaper 557 652 498Newspaper 528 576 812Newspaper 670 836 565Newspaper 534 628 708Newspaper 657 798 546Newspaper 557 497 616Newspaper 474 841 587
The two - way ANOVA in SPSS
Tests of Between-Subjects Effects
Dependent Variable: SALES1
113620.283a 5 22724.057 2.449 .045
22643098.0 1 22643098.02 2439.908 .000
13172.017 1 13172.017 1.419 .239
98838.633 2 49419.317 5.325 .008
1609.633 2 804.817 .087 .917
501136.700 54 9280.309
23257855.0 60
614756.983 59
SourceCorrected Model
Intercept
MEDIUM
ADTYPE
MEDIUM * ADTYPE
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .185 (Adjusted R Squared = .109)a.
Clearly, at the 5% level, the AdvertisingStrategy is affecting mean sales (p=0.008), but Advert medium is NOT significantly affecting mean sales. Also there is no significant interaction effect on mean sales
Where do the differences lie?Multiple Comparisons
Dependent Variable: SALES1
Tukey HSD
-99.3500* 30.4636 .005 -172.7671 -25.9329
-46.5000 30.4636 .287 -119.9171 26.9171
99.3500* 30.4636 .005 25.9329 172.7671
52.8500 30.4636 .202 -20.5671 126.2671
46.5000 30.4636 .287 -26.9171 119.9171
-52.8500 30.4636 .202 -126.2671 20.5671
(J) ADTYPE2.00
3.00
1.00
3.00
1.00
2.00
(I) ADTYPE1.00
2.00
3.00
MeanDifference
(I-J) Std. Error Sig. Lower Bound Upper Bound
95% Confidence Interval
Based on observed means.
The mean difference is significant at the .05 level.*.
Again, we are 95% confident that strategy 2 (quality) is better than strategy 1 (convenience) in terms of mean sales of apple juice.There appears to be no significant difference in mean sales between emphasizing quality or price, nor between price and convenience.
Are the assumptions satisfied?Levene's Test of Equality of Error Variancesa
Dependent Variable: SALES1
.731 5 54 .603F df1 df2 Sig.
Tests the null hypothesis that the error variance of thedependent variable is equal across groups.
Design: Intercept+MEDIUM+ADTYPE+MEDIUM* ADTYPE
a.
The sample variances are NOT significantly different toeach other, as we have a high p-value (0.603 > 0.05) here
202020N =
ADTYPE
3.002.001.00
SA
LES
1
900
800
700
600
500
400
3640
3030N =
MEDIUM
TVNewspape
SA
LE
S1
900
800
700
600
500
400
The observations in each group appear to be fairly symmetric and close to normal