One Factor ANOVA - University of Washingtoncourses.washington.edu/...factor_ANOVA_tutorial.pdf · One-factor ANOVA The ANOVA ("ANalysis Of VAriance") is a hypothesis test for the

One Factor ANOVA

March 5, 2019

Contents

• One-factor ANOVA• Example 1: Exam scores• The Numerator of the ANOVA - variability between the means• The Denominator of the ANOVA - variability within each group• The F-statistic - the ratio of variability between and within.• Finding the critical value of F with table E• The Summary Table• Example 2: Preferred temperature for weather sensitivity• Effect Size• Using R to conduct a one factor ANOVA• Questions• Answers

Happy birthday to Merideth Kirry!

One-factor ANOVA

The ANOVA (”ANalysis Of VAriance”) is a hypothesis test for the difference between two ormore means. This tutorial will describe how to compute the simplest ANOVA - the ’1-way’or ’1-factor’ ANOVA for independent measures.

Here’s how to get to the 1-factor ANOVA with the flow chart:

1

Test for = 0

Ch 17.2

Test for

1 =

2

Ch 17.4

2 testfrequencyCh 19.5

2 testindependence

Ch 19.9

one samplet-test

Ch 13.14

z-testCh 13.1

1-factorANOVACh 20

2-factorANOVACh 21

dependent measurest-test

Ch 16.4

independent measurest-test

Ch 15.6

number ofcorrelations

measurementscale

number ofvariables

Do youknow ?

number ofmeans

number offactors

independentsamples?

STARTHERE

1

2

correlation (r) frequency

2

1

Means

1Yes

No

More than 2 2

1

2

Yes No

The null hypothesis for the ANOVA is that that all samples are drawn from populationswith equal means and equal variances. The alternative hypothesis is that the populationsare not the same, but the variances are still the same.

The logic behind the ANOVA is to compare two estimates of the population variance. Oneis related to the variability between the means and the other is related to the variability ofscores within each group. If the null hypothesis is true, these two estimates should be onaverage the same, so their ratio, called the F-ratio, should be around one. If the populationmeans, differ, then the F-statistic will become larger than one on average. If the measuredvalue of F exceeds a critical value determined by α, then we reject the null hypothesis thatthe population means are the same.

We’ll show how it works with an example.

Example 1: Exam scores

Suppose a professor teaches the same small graduate course over three years ( 2017, 2018and 2019). You want to know if the course grades have varied significantly over these 3years. We’ll run an ANOVA using alpha value of α = 0.05. Here are the scores:

2

2017 2018 201988.4 84 76.677.5 81.3 69.683.5 80.2 85.488.2 72.3 84.484.7 87.3 74.386.2 67.5 7882.1 66.8 80.789.3 78.2 83.585.3 73 73.6

Here is a table of useful statistics from this list:

2017 2018 2019n 9 9 9mean 85.02 76.73 78.46SS 108.8156 417.0001 236.9624s 3.6881 7.2198 5.4425sem 1.2294 2.4066 1.8142

Also, the mean of all scores, called the ’grand mean’, is 80.0704.

We’ll use the numbers from this table to visualize the data using a bar graph with error barsrepresenting standard errors of the mean, just like we did with the independent-measurest-test:

3

2017 2018 2019

Exam

72

74

76

78

80

82

84

86

88

scor

e

We can use the same rule of thumb as we did for the independent measures t-test to estimatethe significance of the differences between pairs of means. Remember, if the error barsoverlap, then we will almost always fail to reject the null hypothesis for a one-tailed testwith α = .05. Since this is the most liberal of all tests, we will also fail to reject for anyother choice of tails or alphas. But remember, we can’t make separate t-tests on each pairof means - the probability of at least one type I error will be greater than α.

Looking at the graph, you might guess that there is a statistically significant differencebetween the means.

The Numerator of the ANOVA - variability between the means

The numerator of the ANOVA’s F-statistic reflects how different the means are from eachother. Under the null hypothesis, this number, called the ’variance between’, or MSbet is anestimate of the population variance.

We first calculate the sums of squared deviation of each mean from the grand mean, scaledby the sample size. This is called SSbet:

SSbet =∑n(X̄ − ¯̄X)2 =

4

(9)(85.02− 80.0704)2 + (9)(76.73− 80.0704)2 + (9)(78.46− 80.0704)2 =

220.4869 + 100.4244 + 23.3405 = 344.2518

The degrees of freedom for SSbet is the number of groups minus one: 3 - 1 = 2.

We can now calculate MSbet which is SSbet divided by its degrees of freedom:

MSbet =SSbetdfbet

= 344.25182 = 172.1259

The Denominator of the ANOVA - variability within each group

Another estimate of the population variance is the variance of the scores within each group.This number, called the ’variance within’ or MSw is calculated by first adding up the sumof the squared deviation of each score from it’s own group mean:

SSw =∑j

∑i

(Xi − X̄j)2

The table above already gives us the sum of squared deviation for each group, so the totalsums of squared within is:

SSw = 108.8156 + 417.0001 + 236.9624 = 762.7781

The degrees of freedom for SS within is the number of scores minus the number of groups:9 + 9 + 9 - 3 = 24. The variance within is SS within divided by its degrees of freedom:

MSw = SSwdfw

= 762.778124 = 31.7824

The F-statistic - the ratio of variability between and within.

Finally, we’ll calculate the observed value of F which is the ratio of MSbet and MSw:

F =MSbetMSw

= 172.125931.7824 = 5.4158

Finding the critical value of F with table E

The distribution of the F-statistic is a family of curves that varies both with dfbet and dfw.Table E provides critical values of F for α = .05 and α = .01 (bold). The rows of the tablecorrespond to dfw and the columns are for dfbet. For our example, dfbet is 2 and dfw is 24.Here’s the relevant part of the table:

5

dfw|dfb 1 2 323 4.28 3.42 3.03

7.88 5.66 4.7624 4.26 3.4 3.01

7.82 5.61 4.7225 4.24 3.39 2.99

7.77 5.57 4.68

Looking at the row for dfbet = 2 and the column for dfw = 24, we see that in bold thecritical value for F with α = 0.05 is 3.4.

Remember, both the numerator and the denominator for the ANOVA are measures of thepopulation variance under the null hypothesis. If the null hypothesis is false, then the meanswill have greater variability, so the numerator will grow large compared to the denominator.Therefore, a large value of F is evidence against the null hypothesis. Specifically, if ourobserved value of F is greater than the critical value, then we reject the null hypothesis andconclude that there is a significant difference between the means.

For our example, since our observed value of F (5.4158) is greater than the critical value ofF (3.4), we reject the null hypothesis.

Using APA format, we state: ”There is a significant difference between the means of the 3exams, F(2,24) = 5.4158, p < 0.05”

We can also use the F-calculator in the Excel spreadsheet to find the actual p-value:

Convert α to Fdfb dfw α F2 24 0.05 3.4

Convert F to αdfb dfw F α2 24 5.4158 0.0115

Using this p-value we can conclude using APA format: ”There is a significant differencebetween the means of the 3 exams, F(2,24) = 5.4158, p = 0.0115”

The Summary Table

The statistics used to generate the F statistic for an ANOVA are traditionally reported ina summary table like this:

SS df MS F Fcrit p-value

Between 344.2518 2 172.1259 5.4158 3.4 0.0115

Within 762.7781 24 31.7824

Total 1107.1563 26

6

In this next example we’ll use this table to keep track of our values.

Example 2: Preferred temperature for weather sensitivity

At the beginning of the quarter I surveyed you for your preferred outdoor temperature. Ialso asked you how much weather affected your mood with the options of Not at all, Justa little, A fair amount and Very much. Let’s see if there is a significant difference betweenthe preferred temperatures across these 4 options. We’ll us α = 0.05 again. Here’s a tableof statistics:

Not at all Just a little A fair amount Very muchn 8 26 20 36mean 68.38 70.96 71.5 73.82SS 673.8752 1510.9616 1027 1813.1504s 9.8116 7.7742 7.352 7.1975sem 3.4689 1.5246 1.644 1.1996

Totals:n 90grand mean 71.9933SStotal 5281.956

Here’s a graph of the means with error bars representing the standard error of the means:

7

Not at all Just a little A fair amount Very much

How much does weather affect your mood?

62

64

66

68

70

72

74

76

78P

refe

rred

Tem

pera

ture

(F

)

We can easily fill in some of the entries in the summary table. I’ve told you that SStotal =5281.956. The df for SStotal is the total sample size minus 1: 90 - 1 = 89.

SSw is just the sum of SSw for each cell: 673.8752 + 1510.9616 + 1027 + 1813.1504 =5024.9872. The df for SSw is the total sample size minus the number of groups: 90 - 4 =86.

SS df MS F Fcrit p-valueBetweenWithin 5024.9872 86Total 5281.956 89

SSbet can be hard to calculate, but since we know that SStotal = SSw + SSbet, we know thatSSbet = 5281.956 - 5024.9872 = 256.9688. The degrees of freedom is 4 - 1 = 3.

Just like for the last example, the variance within and between are just the SS divided bytheir df’s:

MSbet =SSbetdfbet

= 257.20083 = 85.7336

8

MSw = SSwdfw

= 5024.987286 = 58.4301

Finally, the F statistic is their ratio:

F =MSbetMSw

= 85.733658.4301 = 1.4673

The table now looks like this:

SS df MS F Fcrit p-valueBetween 257.2008 3 85.7336 1.4673Within 5024.9872 86 58.4301Total 5281.956 89

The critical value of F for α = 0.05 and dfbet = 3 and dfw = 86 can be found in table E.

Since dfw = 86 is not in the table, we’ll use 80:dfw|dfb 3

80 2.724.04

If you use the F calculaton the Excel spreadsheet you’ll see that the p-value is 0.2291:

Convert α to Fdfb dfw α F3 86 0.05 2.71

Convert F to αdfb dfw F α3 86 1.4673 0.2291

The final summary table is:

SS df MS F Fcrit p-valueBetween 257.2008 3 85.7336 1.4673 2.72 0.2291Within 5024.9872 86 58.4301Total 5281.956 89

The observed value of F (1.4673) is not greater than the critical value of F (2.72), we failto reject the null hypothesis.

Using APA format, we state: ”There is not a significant difference in preferred outdoortemperature across the 4 levels of the survey about how weather affects mood, F(3,86) =1.4673, p = 0.2291.”

Note that the ANOVA doesn’t tell us anything about exactly how preferred temperaturevaries with how weather affects mood. Looking at the bar graph, it appears that there is anupward trend so that the more weather affects mood, the higher the preferred temperature.But to make statistical conclusions like this requires a post-hoc analysis that allows you tocompare specific means to other means.

9

Effect Size

There are several measures of effect size for an ANOVA, and most software packages will

spit out more than one. Probably the most common is η2, or ’eta squared’, which is theproportion of total variance in your data that is attributed to the effect driving the differencebetween the means. It’s simply the ratio of SSbet to SStotal:

η2 =SSbetSStotal

From our last example:

η2 = 257.2015281.96 = 0.0487

If SSbet = 0, then η2 = 0. This is small as η2 can get. It happens when there areno differences between the means, so all of the variaiblity in your data is attributed tovarability within each group.

If SSw = 0, then SSbet = SStotal, which makes η2 = 1. This is as big as η2 can get. Ithappens when the means are different, but there is no variability within each group.

η2 is simple, comonly used, but tends to overestimate effect size for larger number of treat-

ments. Just so you know, η2 is sometimes called ’partial eta-squared’ for some reason.

A rule of thumb for interpreting the size of η2 is:

0.01 small0.06 medium0.14 large

Note that these values of small, medium and large are very different than those used tointerpret effect size for t-tests (Cohen’s d).

Using R to conduct a one factor ANOVA

Computing a one factor ANOVA is easy using the ’aov’ function. The tricky part is pullingout the values from the output, and using ggplot to plot bar graphs and error bars. Thefollowing R script conducts the test in Example 2 on preferred temperature and how muchweather affects mood.

The R commands shown below can be found here: OneFactorANOVA.R

# 1-factor ANOVA

#

# Conducting a 1-factor ANOVA from our survey data is easy using the ’aov’ function.

# Interpreting and plotting the results will require two libraries, ’ggplot’ for

# plotting and ’broom’ for cleaning up the output of ’aov’.

install.packages("ggplot2")

10

www.courses.washington.edu/psy315/R/OneFactorANOVA.R

Installing package into ’C:/Users/Geoff Boynton/Documents/R/win-library/3.5’

(as ’lib’ is unspecified)

Error in contrib.url(repos, "source") :

trying to use CRAN without setting a mirror

Calls: install.packages -> contrib.url

Execution halted


How much does weather affect your mood?

62

64

66

68

70

72

74

76

78

Pre

ferr

ed T

empe

ratu

re (

F)

11

Questions

Your turn again. Here are 7 practice questions based on the class survey, followed by theiranswers.

1) From our survey we can calculate Caffeiene consumption as a function of how much yousaid you liked math. Here is a table of statistics based on our survey:


n 8 31 50 6

mean 2.06 0.77 1.45 1.17

SS 45.2188 58.9626 137.125 16.8334

ntotal 95

grand mean 1.2624

SStotal 272.5327

Calculate the standard errors of the mean for each of the 4 groups.Make a bar graph of the means for each of the 4 groups with error bars as the standarderror of the means.Using an alpha value of α =0.05, is there difference in Caffeine consumption across the 4groups of students who vary their preference for math?

2) From our survey we can calculate the preferred outdoor temperature as a func-tion of how much weather affects your mood. Here is a table of statistics based on oursurvey:

agree disagree neutral strongly agree

n 25 23 32 15

mean 1.36 1.39 1.63 1.87

SS 35.76 29.4783 51.5008 15.7335

ntotal 95

grand mean 1.5368

SStotal 135.6211

Calculate the standard errors of the mean for each of the 4 groups.Make a bar graph of the means for each of the 4 groups with error bars as the standarderror of the means.Using an alpha value of α =0.05, is there difference in number of siblings across the 4groups of how siblings varies with introversion?

3) From our survey we can calculate how much you drink across voting preferences.

12

Here is a table of statistics based on our survey:

Democrat I never (or can’t)vote

Republican

n 62 21 7

mean 2.76 1.67 4.57

SS 1303.3712 192.6669 157.7143

ntotal 90

grand mean 2.6444

SStotal 1700.6222

Calculate the standard errors of the mean for each of the 3 groups.Make a bar graph of the means for each of the 3 groups with error bars as the standarderror of the means.Using an alpha value of α =0.01, is there difference in drink per week across the 3 groupsof voting preference for students?

4) From our survey we can calculate how well you think you’ll do on Exam 1 as afunction of how much you think you’ll like this class. Here is a table of statistics based onour survey:


n 8 31 50 6

mean 77 84.58 87.98 88.33

SS 1054 3811.5484 2222.98 433.3334

ntotal 95

grand mean 85.9684

SStotal 8460.9053

Calculate the standard errors of the mean for each of the 4 groups.Make a bar graph of the means for each of the 4 groups with error bars as the standarderror of the means.Using an alpha value of α =0.01, is there difference in predicted Exam 1 score across the 4groups of how much you think you’ll like Psych 315?

5) Suppose you want to know if how much students drink varies with how muchthey exercise. Here is a table of statistics based on our survey:

13


n 6 26 46 17

mean 1.83 1.42 2.2 5.94

SS 80.8334 136.3464 339.24 928.9412

ntotal 95

grand mean 2.6316

SStotal 1722.1053

Calculate the standard errors of the mean for each of the 4 groups.Make a bar graph of the means for each of the 4 groups with error bars as the standarderror of the means.Using an alpha value of α =0.01, is there difference in alcoholic drinks across the 4 groupsof students who exercise?

6) From our survey we can compute how many hours per week students play videogames as a function of how much they exercise. Here is a table of statistics based on oursurvey:


n 6 26 46 17

mean 0.17 0.62 0.86 0.88

SS 0.8334 32.1544 47.3316 17.7648

ntotal 95

grand mean 0.7526

SStotal 101.4368

Calculate the standard errors of the mean for each of the 4 groups.Make a bar graph of the means for each of the 4 groups with error bars as the standarderror of the means.Using an alpha value of α =0.05, is there difference in hours of video game playing perweek across the 4 groups of students who exercise?

7) From our survey we can see if there is a difference in the heights of studentsacross different levels of exercise. Here is a table of statistics based on our survey:

Just a little A fair amount Very much

n 26 46 17

mean 64.54 66.35 66

SS 280.4616 592.435 186

14

ntotal 89

grand mean 65.7528

SStotal 1114.5618

Calculate the standard errors of the mean for each of the 3 groups.Make a bar graph of the means for each of the 3 groups with error bars as the standarderror of the means.Using an alpha value of α =0.05, is there difference in height across the 3 groups of studentswho exercise?

15

Answers

1)

SSbet = (8)(2.06 − 1.2624)2 + (31)(0.77 − 1.2624)2 + (50)(1.45 − 1.2624)2 + (6)(1.17 −1.2624)2 = 14.4164

MSbet = 14.41643 = 4.8055

SSw = SStotal − SSbet = 272.533− 14.4164 = 258.14

MSw = 258.139891 = 2.84

F = 4.80552.8367 = 1.69

Fcrit = 2.72 (with dfbet = 3, dfw = 80 and α = 0.05)


Between 14.4164 3 4.8055 1.694 2.72 0.1739

Within 258.1398 91 2.8367

Total 272.5327 94

We fail to reject H0.

There is not a significant difference in mean Caffeine consumption across the 4 groups ofstudents who vary their preference for math, F(3,91) = 1.69, p = 0.1739.

students who vary their preference for math


0.5

1

1.5

2

2.5

3

Caf

fein

e co

nsum

ptio

n

16


Between 14.4164 3 4.8055 1.694 2.72 0.1739

Within 258.1398 91 2.8367

Total 272.5327 94

17

2)

SSbet = (25)(1.36 − 1.5368)2 + (23)(1.39 − 1.5368)2 + (32)(1.63 − 1.5368)2 + (15)(1.87 −1.5368)2 = 3.2204

MSbet = 3.22043 = 1.0735


MSw = 132.472691 = 1.46

F = 1.07351.4557 = 0.74



Between 3.2204 3 1.0735 0.7374 2.72 0.5324

Within 132.4726 91 1.4557

Total 135.6211 94


There is not a significant difference in mean number of siblings across the 4 groupsof how siblings varies with introversion, F(3,91) = 0.74, p = 0.5324.

how siblings varies with introversion

agree disagree neutral strongly agree1

1.2

1.4

1.6

1.8

2

2.2

num

ber

of s

iblin

gs

18


Between 3.2204 3 1.0735 0.7374 2.72 0.5324

Within 132.4726 91 1.4557

Total 135.6211 94

19

3)

SSbet = (62)(2.76− 2.6444)2 + (21)(1.67− 2.6444)2 + (7)(4.57− 2.6444)2 = 46.7226

MSbet = 46.72262 = 23.3613


MSw = 1653.752487 = 19.01

F = 23.361319.0086 = 1.23



Between 46.7226 2 23.3613 1.229 4.88 0.2976

Within 1653.7524 87 19.0086

Total 1700.6222 89


There is not a significant difference in mean drink per week across the 3 groups ofvoting preference for students, F(2,87) = 1.23, p = 0.2976.

voting preference for students

Democrat I never (or can't) vote Republican

1

2

3

4

5

6

7

drin

k pe

r w

eek

20


Between 46.7226 2 23.3613 1.229 4.88 0.2976

Within 1653.7524 87 19.0086

Total 1700.6222 89

21

4)

SSbet = (8)(77− 85.9684)2 + (31)(84.58− 85.9684)2 + (50)(87.98− 85.9684)2 + (6)(88.33−85.9684)2 = 939.005

MSbet = 939.00453 = 313.0015


MSw = 7521.861891 = 82.66

F = 313.001582.6578 = 3.79



Between 939.0045 3 313.0015 3.7867 4.04 0.0131

Within 7521.8618 91 82.6578

Total 8460.9053 94


There is not a significant difference in mean predicted Exam 1 score across the 4groups of how much you think you’ll like Psych 315, F(3,91) = 3.79, p = 0.0131.

how much you think you'll like Psych 315


75

80

85

90

pred

icte

d E

xam

1 s

core

22


Between 939.0045 3 313.0015 3.7867 4.04 0.0131

Within 7521.8618 91 82.6578

Total 8460.9053 94

23

5)

SSbet = (6)(1.83 − 2.6316)2 + (26)(1.42 − 2.6316)2 + (46)(2.2 − 2.6316)2 + (17)(5.94 −2.6316)2 = 236.665

MSbet = 236.66523 = 78.8884


MSw = 1485.36191 = 16.32

F = 78.888416.3226 = 4.83



Between 236.6652 3 78.8884 4.8331 4.04 0.0036

Within 1485.361 91 16.3226

Total 1722.1053 94

We reject H0.

There is a significant difference in mean alcoholic drinks across the 4 groups of stu-dents who exercise, F(3,91) = 4.83, p = 0.0036.

students who exercise


0

1

2

3

4

5

6

7

8

alco

holic

drin

ks

24


Between 236.6652 3 78.8884 4.8331 4.04 0.0036

Within 1485.361 91 16.3226

Total 1722.1053 94

25

6)

SSbet = (6)(0.17 − 0.7526)2 + (26)(0.62 − 0.7526)2 + (46)(0.86 − 0.7526)2 + (17)(0.88 −0.7526)2 = 3.3002

MSbet = 3.30023 = 1.1001


MSw = 98.084291 = 1.08

F = 1.10011.0778 = 1.02



Between 3.3002 3 1.1001 1.0207 2.72 0.3873

Within 98.0842 91 1.0778

Total 101.4368 94


There is not a significant difference in mean hours of video game playing per weekacross the 4 groups of students who exercise, F(3,91) = 1.02, p = 0.3873.



0

0.2

0.4

0.6

0.8

1

1.2

hour

s of

vid

eo g

ame

play

ing

per

wee

k

26


Between 3.3002 3 1.1001 1.0207 2.72 0.3873

Within 98.0842 91 1.0778

Total 101.4368 94

27

7)

SSbet = (26)(64.54− 65.7528)2 + (46)(66.35− 65.7528)2 + (17)(66− 65.7528)2 = 55.6876

MSbet = 55.68762 = 27.8438


MSw = 1058.896686 = 12.31

F = 27.843812.3128 = 2.26



Between 55.6876 2 27.8438 2.2614 3.11 0.1104

Within 1058.8966 86 12.3128

Total 1114.5618 88


There is not a significant difference in mean height across the 3 groups of studentswho exercise, F(2,86) = 2.26, p = 0.1104.


Just a little A fair amount Very much

64

64.5

65

65.5

66

66.5

67

heig

ht

28


Between 55.6876 2 27.8438 2.2614 3.11 0.1104

Within 1058.8966 86 12.3128

Total 1114.5618 88

29