Chapter 10: ANOVA and Factorial Experimentspeople.stat.sc.edu/sshen/courses/16fstat509/notes/Chapter 10 ANOVA...Chapter 10: ANOVA and Factorial Experiments Shiwen Shen University of

Chapter 10: ANOVA and Factorial Experiments

Shiwen Shen

University of South Carolina

2016 Fall Section 003

1 / 32

Introduction

I Recall: in chapter 7, we discussed confidence intervals andhypothesis tests for the difference of two population means µ1 − µ2.

I More generally, the purpose of an experiment is to investigatedifferences between or among two or more treatments. In astatistical framework, we do this by comparing the population meansof the responses to each treatment.

I For example, in the case with three populations,

H0 : µ1 = µ2 = µ3

Ha : the population means are not all equal.

2 / 32

Example: Mortar Mix

Mortar mixes are usually classified on the basis of compressive strengthand their bonding propoerties and flexibility. In a building project,engineers wanted to compare specifically the population mean strengthsof four trypes of mortars:

1. ordinary cement mortar (OCM)

2. polymer impregnated mortar (PIM)

3. resin mortar (RM)

4. polymer cement mortar (PCM)

Random samples of specimens of each mortar type were taken. Eachspecimen was subjected to a compression test to measure strength(MPa).

3 / 32

Example: Mortar Mix

Here are the strength measurements taken on different mortar specimens(36 in all).

4 / 32

Example: Mortar Mix

Side-by-side boxplots of strength (MPa) for four mortar types:

5 / 32

Example: Mortar Mix

I An initial question that engineers may have is the following: “Arethe population mean mortar strengths equal among the four types ofmortars? Or, are the population means different?’

I This question can be framed statistically as the followinghypothesis test:

H0 : µ1 = µ2 = µ3 = µ4

Ha : the population means are not all equal.

I Goal: We now develop a statistical inference procedure that allowsus to test this type of hypothesis in a one-way classification.

6 / 32

F test

I Let t denote the number of treatments (populations) to becompared. Define

Yij = response on the jth individual in the ith treatment group

for i = 1, 2, · · · , t and j = 1, 2, · · · , ni .I ni is the number of observations for the ith treatment (population)

I When n1 = n2 = · · · = nt = n, we say the design is balanced;otherwise, the design is unbalanced.

I Let N = n1 + n2 + · · · + nt denote the total number of individualsmeasured. N = tn if the design is balanced.

I For simplicity, let’s assume balanced design here.

7 / 32

Treatment Statistics

Define the statistics

I Sample mean of the ith sample:

Yi+ =1

n

n∑j=1

Yij

I Sample variance of the ith sample:

S2i =

1

n − 1

n∑j=1

(Yij − Yi+)2

I Overall sample mean

Y++ =1

N

t∑i=1

n∑j=1

Yij

8 / 32

F Test Assumptions

There are three assumptions for F test:

1. the t random samples are independent.

2. the t population distributions are normal.

3. the t population distributions have the same variance σ2.

9 / 32

Analysis of Variance

I Question: If we are trying to compare the population means, why isthe statistical inference procedure called “Analysis of Variance”(ANOVA)?

I Answer: We learn the population means by estimating the commonvariance σ2 in two different ways. These two estimators are formedby

I measuring variability of the observations within each sample.I measuring variability of the sample means across the samples.

Idea: These two estimates tend to be similar when H0 is true. Thesecond estimate tends to be larger than the first estimate when Ha istrue.

10 / 32

Within Sample Variability Statistics

Let’s define

I Residual Sum of Squares:

SSres = (n − 1)S21 + (n − 1)S2

2 + · · · + (n − 1)S2t

=t∑

i=1

n∑j=1

(Yij − Yi+)2

I Residual Mean Squares:

MSres =SSresN − t

11 / 32

Across Sample Variability Statistics

Let’s define

I Treatment Sum of Squares:

SStrt =t∑

i=1

n(Yi+ − Y++)2

I Treatment Mean Squares:

MStrt =SStrtt − 1

12 / 32

Important Facts

I When H0 is true (population means are equal), then

E (MStrt) = σ2

E (MSres) = σ2

I When Ha is true (population means are different), then

E (MStrt) > σ2

E (MSres) = σ2

I Define F = MStrt

MSres, under H0 we have

F ∼ F (t − 1,N − t),

and F ≈ 1 if H0 is true, and F > 1 if Ha is true.

13 / 32

Example: Mortar Mix

We use R to calculate the F test statistic in mortar mix data and R gives

> anova(lm(strength ~ mortar.type))

Analysis of Variance Table

Response: strength

Df Sum Sq Mean Sq F value Pr(>F)

mortar.type 3 1520.88 506.96 16.848 9.576e-07 ***

Residuals 32 962.86 30.09

From the R result, we have SStrt = 1520.88, t − 1 = 4 − 1 = 3,MStrt = SStrt

t−1 = 506.96, SSres = 962.86, N − t = 36 − 4 = 32,

MSres = SSres

N−t = 30.09, F = MStrt

MSres= 16.848, p-value= 9.576 × 10−7.

14 / 32

Example: Mortar Mix

I The p-value is 9.576 × 10−7. It is very small, smaller than α = 0.05.Therefore, we reject the H0 and conclude with 95% confidence thattreatment (population) means are different.

I R code is attached:

OCM = c(51.45,42.96,41.11,48.06,38.27,38.88,

42.74,49.62)

PIM = c(64.97,64.21,57.39,52.79,64.87,53.27,

51.24,55.87,61.76,67.15)

RM = c(48.95,62.41,52.11,60.45,58.07,52.16,

61.71,61.06,57.63,56.80)

PCM = c(35.28,38.59,48.64,50.99,51.52,

52.85,46.75,48.31)

15 / 32

Mortar Mix (R code conti.)

boxplot(OCM,PIM,RM,PCM,

xlab="",names=c("OCM","PIM","RM","PCM"),

ylab="Strength (MPa)",col="grey")

strength = c(OCM,PIM,RM,PCM)

mortar.type = c(

rep("OCM",length(OCM)),

rep("PIM",length(PIM)),

rep("RM",length(RM)),

rep("PCM",length(PCM))

)

mortar.type = factor(mortar.type)

anova(lm(strength ~ mortar.type))

16 / 32

ANOVA Table

We can use the following ANOVA table to summarize the result:

Source df SS MS F

Treatments t-1 SStrt MStrt F = MStrt

MSres

Residuals N-t SSres MSresTotal N-1 SStotal

In the Mortar mix data, we have

Source df SS MS FTreatments 4-1 1520.88 506.96 16.848Residuals 36-4 962.86 30.09Total 36-1 2483.74

Note that SStotal = SStrt + SSres and the p-value can be calculated bypf(-16.848, 4-1, 36-4) in R.

17 / 32

ANOVA in Multiple Linear Regression

I In multiple linear regression, we have multiple predictors contributingto the model to estimate or predict the value of responsibilityvarialbe.

I One interesting hypothesis test is

H0 : β0 = β1 = β2 = · · · = βp = 0

Ha : at leat one of the βj is nonzero, j = 1, 2, · · · , p

I We can read the F test statistic and the p-value directly in R usingsummary().

18 / 32

Example: Cheese Data

Recall in chapter 9, we initially use acetic, h2s, and lactic together topredict taste.

> fit <- lm(taste ~ acetic + h2s + lactic, data=cheese)

> summary(fit)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -28.877 19.735 -1.463 0.15540

acetic 0.328 4.460 0.074 0.94193

h2s 3.912 1.248 3.133 0.00425 **

lactic 19.670 8.629 2.279 0.03109 *

Residual standard error: 10.13 on 26 degrees of freedom

Multiple R-squared: 0.6518, Adjusted R-squared: 0.6116

F-statistic: 16.22 on 3 and 26 DF, p-value: 3.81e-06

19 / 32

Example: Cheese Data

I We want to test

H0 : β0 = β1 = β2 = β3 = 0

Ha : at leat one of the β0, β1, β2, and β3 is nonzero

I From the R output, we have F = 16.22 with p-value= 3.81 × 10−6,which has a strong evidence to reject the H0.

I The rejection of the H0 of F test indicates that at least one of thepredictors is important in describing the response Y in thepopulation. (However, we are not informed which predictor(s) isimportant.) If the H0 of F test is not rejected, we consider allpredictors are uselss.

20 / 32

Analysis of Factorial Experiments

I In engineering experiments, there are often several variables ofinterest and the goal is to understand the effects of these variableson a continuous response Y . A factorial treatment structure is anefficient way of defining treatments in these types of experiments.

I One example of a factorial treatment structure uses k factors, whereeach factor has two levels. This is called a 2k factorial experiment.

21 / 32

Example: 24 Factorial Experiment

A nickel-titanium alloy is used to make components for jet turbineaircraft engines. Cracking is a potentially serious problem in the finalpart, as it can lead to nonrecoverable failure. A test is run at the partsproducer to determine the effect of k = 4 factors on cracks: pouringtemperature (A), titanium content (B), heat treatment method (C), andamount of grain refiner used (D).

I Factor A has 2 levels: ”low” and ”high” temperature

I Factor B has 2 levels: ”low” and ”hgih” content

I Factor C has 2 levels: Method 1 and Method 2

I Factor D has 2 levels: ”low” and ”high” amount

The response variable is

Y = length of largest crack (mm) induced in a piece of sample material

22 / 32

Example: 24 Factorial Experiment

I There are totally 2 × 2 × 2 × 2 = 16 different treatmentcombinations. For example, a1b1c1d1, a1b1c1d2, etc.

I One replicate of the experiment uses 16 observations, one at eachof the 16 treatment combinations. Two replicates would require 32observations, and so on.

I Our goal is to understand how these variables affect the response Y .

I Let’s do the analysis together using a simple 22 experiment.

23 / 32

Example: A 22 Experiment

I Predicting corn yield prior to harvest is useful for making feed supplyand marketing decisions. Corn must have an adequate amount ofnitrogen (Facter A) and phosphorus (Factor B) for profitableproduction and also for environmental concerns.

I There two levels of nitrogen (a1 = 10 and a2 = 15) and two levels ofphosphorus (b1 = 2 and b2 = 4) were used.

I We collect the data with 5 replications, and the response is

Y = yield per plot (measured in bushels)

a1b1 35, 26, 25, 33, 31

a1b2 39, 33, 41, 31, 36

a2b1 37, 27, 35, 27, 34

a2b2 49, 39, 39, 47, 46

24 / 32

Naive Analysis

I One silly way analyze these data would be simply regard each of thecombinations a1b1, a1b2, a2b1, and a2b2 as a ”treatment” andperform a one-way ANOVA with t = 4 treatment groups like we didin Mortar Mix example.

I This would produce the following ANOVA table:

> anova(lm(yield ~ treatment))


Response: yield


treatment 3 575 191.67 9.5833 0.0007362 ***

Residuals 16 320 20.00

25 / 32

Unintereseting Conclusion

The p-value for F test is 0.0007362, so we can reject

H0 : µ11 = µ12 = µ21 = µ22

Therefore, we conclude that at least one of the factorial treatmentpopulation means is different.

26 / 32

Partition of the Variability

I It is not interesting to know that there is a difference in thepopulation productions, instead we want to know which variable(nitrogen or phosphorus or both) contribute to the production.

I In the previous ANOVA analysis, we have SStrt which measures totalwithin sample variability. Here, by ”partition”, I mean that we canwrite

SStrt = SSA + SSB + SSAB

I SSA is the sum of squares due to the main effect of A (nitrogen)

I SSB is the sum of squares due to the main effect of B (phosphorus)

I SSAB is the sum of squares due to the interaction effect of A andB.

27 / 32

Analysis Using R

> fit <- lm(yield ~ nitrogen + phosphorus +

nitrogen*phosphorus)

> anova(fit)


Response: yield


nitrogen 1 125 125 6.25 0.0236742 *

phosphorus 1 405 405 20.25 0.0003635 ***

nitrogen:phosphorus 1 45 45 2.25 0.1530877

Residuals 16 320 20

28 / 32

Analysis Using R

I Our first task is to determine if the interaction effect is significantin the population. Here, with p-value= 0.153 we see that there is noevidence that nitrogen and phosphorus interact to contribute to theproduction.

I Now, we remove the interaction effect and re-fit the model:

> fit2 <- lm(yield ~ nitrogen + phosphorus)

> anova(fit2)


nitrogen 1 125 125.00 5.8219 0.027403 *

phosphorus 1 405 405.00 18.8630 0.000442 ***

Residuals 17 365 21.47

29 / 32

Confidence Interval

I The main effect of nitrogen and phosphorus are significant (bothp-value small)

I Confidence intervals: A 95% confidence interval for µA1 − µA2, thedifference in the population means for the two levels of nitrogen is

(YA1 − YA2) ± t20−2−1,0.025

√MSres(

1

10+

1

10)

and a 95% confidence interval for µB1 − µB2, the difference inpopulation means for the two levels of phosphorus is

(YB1 − YB2) ± t20−2−1,0.025

√MSres(

1

10+

1

10)

30 / 32

Confidence Interval

I After calculation we have

95% CI for µA1 − µA2 : (−9.37,−0.62)

95% CI for µB1 − µB2 : (−13.37,−4.63)

I We are 95% confident that the difference in the population meanyields (for low/high nitrogen) is between -9.37 and -0.62. Thisinterval does not include ”0” and includes only negative values. Thissuggests that the population mean yield at the high level of nitrogenis larger than the population mean yield at the low level of nitrogen.

I Same interpretation can be used for (−13.37,−4.63).

31 / 32

General Strategy For Analyzing 22 Factorial Experiments

I Start by looking at whether the interaction effect is significant.

I If the interaction is significant, then formal analysis of maineffects is not all that meaning full because their interpretationsdepend on the interactgion. In this situation, use the ”naiveanalysis” to just ignore the factorial treatment structure and redothe entire analysis as a one-way ANOVA with four treatments.

I If the interaction is not significant, then re-estimate the modelwithout the interaction term and then exame the main effects usingthe confidence intervals.

32 / 32

Chapter 10: ANOVA and Factorial Experimentspeople.stat.sc.edu/sshen/courses/16fstat509/notes/Chapter 10 ANOVA...Chapter 10: ANOVA and Factorial Experiments Shiwen Shen University of

Documents