Factorial Experiments Analysis of Variance (ANOVA) Experimental Design
Jan 12, 2016
Factorial Experiments
Analysis of Variance (ANOVA)
Experimental Design
• Dependent variable Y
• k Categorical independent variables A, B, C, … (the Factors)
• Let– a = the number of categories (levels) of A– b = the number of categories (levels) of B– c = the number of categories (levels) of C– etc.
Random Effects and Fixed Effects Factors
• A factor is called a fixed effects factors if the levels of the factor are a fixed set of levels and the conclusions of any analysis is in relationship to these levels.
• If the levels have been selected at random from a population of levels the factor is called a random effects factor
• The conclusions of the analysis will be directed at the population of levels and not only the levels selected for the experiment
Example - Random Effects
In this Example a Taxi company is interested in comparing the effects of three brands of tires (A, B and C) on mileage (mpg). Mileage will also be effected by driver. The company selects b = 4 drivers at random from its collection of drivers. Each driver has n = 3 opportunities to use each brand of tire in which mileage is measured.Dependent
– Mileage
Independent– Tire brand (A, B, C),
• Fixed Effect Factor
– Driver (1, 2, 3, 4),• Random Effects factor
Comments
• The ANOVA Table will be the same for performing tests with respect to Source, SS, df and MS.
• The differences will occur in the denominator of the F – ratios.
• The denominators of the F ratios are determined by evaluating Expected Mean Squares for each effect.
Example: 3 factors A , B and C fixed
Source EMS F
A
B
C
AB
AC
BC
ABC
Error
2 2
1
1a
ii
nbc a
2
AB ErrorMS MS
AC ErrorMS MS
BC ErrorMS MS
ABC ErrorMS MS
C ErrorMS MS
B ErrorMS MS 2 2
1
1a
ji
nac b
22
1 1
1 1a b
iji j
nc a b
A ErrorMS MS
2 2
1
1c
kk
nbc c
22
1 1
1 1a c
iji k
nb a c
22
1 1
1 1b c
ijj k
na b c
22
1 1 1
1 1 1a b c
ijki j k
n a b c
Example: 3 factors A, B, C – all are random effects
Source EMS F
A
B
C
AB
AC
BC
ABC
Error
2 2 2 2 2ABC AB AC An nc nb nbc
2 2 2 2 2ABC AB BC Bn nc na nac
2 2 2 2 2ABC BC AC Cn na nb nab
2 2 2ABC ABn nc
2 2 2ABC ACn nb
2 2 2ABC BCn na
2 2ABCn
2
AB ABCMS MS
AC ABCMS MS
BC ABCMS MS
ABC ErrorMS MS
Example: 3 factors A fixed, B, C random
Source EMS F
A
B
C
AB
AC
BC
ABC
Error
2 2 2 2 2
1
1a
ABC AB AC ii
n nc nb nbc a
2 2 2
BC Bna nac
2 2 2BC Cna nab
2 2 2ABC ABn nc
2 2 2ABC ACn nb
2 2BCna
2 2ABCn
2
AB ABCMS MS
AC ABCMS MS
BC ErrorMS MS
ABC ErrorMS MS
C BCMS MS
B BCMS MS
Example: 3 factors A , B fixed, C random
Source EMS F
A
B
C
AB
AC
BC
ABC
Error
2 2 2
1
1a
AC ii
nb nbc a
2 2Cnab
2 2ACnb
2 2BCna
2 2ABCn
2
AB ABCMS MS
AC ErrorMS MS
BC ErrorMS MS
ABC ErrorMS MS
C ErrorMS MS
B BCMS MS 2 2 2
1
1a
BC ji
na nac b
22 2
1 1
1 1a b
ABC iji j
n nc a b
A ACMS MS
Rules for determining Expected Mean Squares (EMS) in an Anova
Table
1. Schultz E. F., Jr. “Rules of Thumb for Determining Expectations of Mean Squares in Analysis of Variance,”Biometrics, Vol 11, 1955, 123-48.
Both fixed and random effects
Formulated by Schultz[1]
1. The EMS for Error is 2.2. The EMS for each ANOVA term contains
two or more terms the first of which is 2.3. All other terms in each EMS contain both
coefficients and subscripts (the total number of letters being one more than the number of factors) (if number of factors is k = 3, then the number of letters is 4)
4. The subscript of 2 in the last term of each EMS is the same as the treatment designation.
5. The subscripts of all 2 other than the first contain the treatment designation. These are written with the combination involving the most letters written first and ending with the treatment designation.
6. When a capital letter is omitted from a subscript , the corresponding small letter appears in the coefficient.
7. For each EMS in the table ignore the letter or letters that designate the effect. If any of the remaining letters designate a fixed effect, delete that term from the EMS.
8. Replace 2 whose subscripts are composed entirely of fixed effects by the appropriate sum.
2
2 1 by 1
a
ii
A a
2
2 1 by 1 1
a
iji
AB a b
Example - Random Effects
In this Example a Taxi company is interested in comparing the effects of three brands of tires (A, B and C) on mileage (mpg). Mileage will also be effected by driver. The company selects at random b = 4 drivers at random from its collection of drivers. Each driver has n = 3 opportunities to use each brand of tire in which mileage is measured.Dependent
– Mileage
Independent– Tire brand (A, B, C),
• Fixed Effect Factor
– Driver (1, 2, 3, 4),• Random Effects factor
The DataDriver Tire Mileage Driver Tire Mileage
1 A 39.6 3 A 33.91 A 38.6 3 A 43.21 A 41.9 3 A 41.31 B 18.1 3 B 17.81 B 20.4 3 B 21.31 B 19 3 B 22.31 C 31.1 3 C 31.31 C 29.8 3 C 28.71 C 26.6 3 C 29.72 A 38.1 4 A 36.92 A 35.4 4 A 30.32 A 38.8 4 A 352 B 18.2 4 B 17.82 B 14 4 B 21.22 B 15.6 4 B 24.32 C 30.2 4 C 27.42 C 27.9 4 C 26.62 C 27.2 4 C 21
Asking SPSS to perform Univariate ANOVA
Select the dependent variable, fixed factors, random factors
The Output
Tests of Between-Subjects Effects
Dependent Variable: MILEAGE
28928.340 1 28928.340 1270.836 .000
68.290 3 22.763a
2072.931 2 1036.465 71.374 .000
87.129 6 14.522b
68.290 3 22.763 1.568 .292
87.129 6 14.522b
87.129 6 14.522 2.039 .099
170.940 24 7.123c
SourceHypothesis
Error
Intercept
Hypothesis
Error
TIRE
Hypothesis
Error
DRIVER
Hypothesis
Error
TIRE * DRIVER
Type IIISum ofSquares df
MeanSquare F Sig.
MS(DRIVER)a.
MS(TIRE * DRIVER)b.
MS(Error)c.
The divisor for both the fixed and the random main effect is MSAB
This is contrary to the advice of some texts
The Anova table for the two factor model (A – fixed, B - random)
ijkijjiijky
Source SS df MS EMS F
A SSA a -1 MSA MSA/MSAB
B SSA b - 1 MSB MSB/MSError
AB SSAB (a -1)(b -1) MSAB MSAB/MSError
Error SSError ab(n – 1) MSError2
a
iiAB a
nbn
1
222
1
22Bna
22ABn
Note: The divisor for testing the main effects of A is no longer MSError but MSAB.
References Guenther, W. C. “Analysis of Variance” Prentice Hall, 1964
The Anova table for the two factor model (A – fixed, B - random)
ijkijjiijky
Source SS df MS EMS F
A SSA a -1 MSA MSA/MSAB
B SSA b - 1 MSB MSB/MSAB
AB SSAB (a -1)(b -1) MSAB MSAB/MSError
Error SSError ab(n – 1) MSError2
a
iiAB a
nbn
1
222
1
222BAB nan
22ABn
Note: In this case the divisor for testing the main effects of A is MSAB . This is the approach used by SPSS.
References Searle “Linear Models” John Wiley, 1964
Crossed and Nested Factors
The factors A, B are called crossed if every level of A appears with every level of B in the treatment combinations.
Levels of B
Levels of A
Factor B is said to be nested within factor A if the levels of B differ for each level of A.
Levels of B
Levels of A
Example: A company has a = 4 plants for producing paper. Each plant has 6 machines for producing the paper. The company is interested in how paper strength (Y) differs from plant to plant and from machine to machine within plant
Plants
Machines
Machines (B) are nested within plants (A)
The model for a two factor experiment with B nested within A.
error random within ofeffect factor ofeffect mean overall
ijkAB
ijA
iijky
The ANOVA table
Source SS df MS F p - value
A SSA a - 1 MSA MSA/MSError
B(A) SSB(A) a(b – 1) MSB(A) MSB(A) /MSError
Error SSError ab(n – 1) MSError
Note: SSB(A ) = SSB + SSAB and a(b – 1) = (b – 1) + (a - 1)(b – 1)
Example: A company has a = 4 plants for producing paper. Each plant has 6 machines for producing the paper. The company is interested in how paper strength (Y) differs from plant to plant and from machine to machine within plant.
Also we have n = 5 measurements of paper strength for each of the 24 machines
The Data
Plant 1 2 machine 1 2 3 4 5 6 7 8 9 10 11 12
98.7 59.2 84.1 72.3 83.5 60.6 33.6 44.8 58.9 63.9 63.7 48.1 93.1 87.8 86.3 110.3 89.3 84.8 48.2 57.3 51.6 62.3 54.6 50.6
100.0 84.1 83.4 81.6 86.1 83.6 68.9 66.5 45.2 61.1 55.3 39.9 Plant 3 4 machine 13 14 15 16 17 18 19 20 21 22 23 24
83.6 76.1 64.2 69.2 77.4 61.0 64.2 35.5 46.9 37.0 43.8 30.0 84.6 55.4 58.4 86.7 63.3 81.3 50.3 30.8 43.1 47.8 62.4 43.0
90.6 92.3 75.4 60.8 76.6 73.8 32.1 36.3 40.8 41.0 60.8 56.9
Anova Table Treating Factors (Plant, Machine) as crossed
Tests of Between-Subjects Effects
Dependent Variable: STRENGTH
21031.065a 23 914.394 7.972 .000
298531.4 1 298531.4 2602.776 .000
18174.761 3 6058.254 52.820 .000
1238.379 5 247.676 2.159 .074
1617.925 15 107.862 .940 .528
5505.469 48 114.697
325067.9 72
26536.534 71
SourceCorrected Model
Intercept
PLANT
MACHINE
PLANT * MACHINE
Error
Total
Corrected Total
Type IIISum of
Squares dfMean
Square F Sig.
R Squared = .793 (Adjusted R Squared = .693)a.
Anova Table: Two factor experiment B(machine) nested in A (plant)
Source Sum of Squares df Mean Square F p - valuePlant 18174.76119 3 6058.253731 52.819506 0.00000 Machine(Plant) 2856.303672 20 142.8151836 1.2451488 0.26171 Error 5505.469467 48 114.6972806
ANOVA Table for 3 factors crossed
Effect SS df
A SSA (a – 1)
B SSB(b – 1)
C SSC(c – 1)
AB SSAB(a – 1) (b – 1)
AC SSAC(a – 1) (c – 1)
BC SSBC(b – 1) (c – 1)
ABC SSABC(a – 1) (b – 1) (c – 1)
Error SSErrorabc(n – 1)
ANOVA Table for 3 nested factors B nested in A, C nested in B
Effect SS df
A SSA (a – 1)
B(A) SSB(A)a(b – 1)
C(AB) SSC(AB)ab(c – 1)
Error SSErrorabc(n – 1)
Note: SSB(A) = SSB + SSAB and a(b – 1) = (b – 1) + (a – 1)(b –1)Also SSC(AB) = SSC + SSAC + SSBC + SSABC and ab(c – 1) = (c – 1) + (a – 1)(c –1) + (b – 1)(c –1) + (a – 1)(b –1)(c –1)
Also in nested designs
Factors may be fixed effect factors
or random effect factors
Levels of the factor are a fixed set of levels
Levels of the factor are chosen at random from a population of levels
This effects the divisor in the F ratio for testing the effect
Other experimental designs
Randomized Block design
Latin Square design
Repeated Measures design
The Randomized Block Design
• Suppose a researcher is interested in how several treatments affect a continuous response variable (Y).
• The treatments may be the levels of a single factor or they may be the combinations of levels of several factors.
• Suppose we have available to us a total of N = nt experimental units to which we are going to apply the different treatments.
The Completely Randomized (CR) design randomly divides the experimental units into t groups of size n and randomly assigns a treatment to each group.
The Randomized Block Design
• divides the group of experimental units into n homogeneous groups of size t.
• These homogeneous groups are called blocks.
• The treatments are then randomly assigned to the experimental units in each block - one treatment to a unit in each block.
Example 1:
• Suppose we are interested in how weight gain (Y) in rats is affected by Source of protein (Beef, Cereal, and Pork) and by Level of Protein (High or Low).
• There are a total of t = 32 = 6 treatment combinations of the two factors (Beef -High Protein, Cereal-High Protein, Pork-High Protein, Beef -Low Protein, Cereal-Low Protein, and Pork-Low Protein) .
• Suppose we have available to us a total of N = 60 experimental rats to which we are going to apply the different diets based on the t = 6 treatment combinations.
• Prior to the experimentation the rats were divided into n = 10 homogeneous groups of size 6.
• The grouping was based on factors that had previously been ignored (Example - Initial weight size, appetite size etc.)
• Within each of the 10 blocks a rat is randomly assigned a treatment combination (diet).
• The weight gain after a fixed period is measured for each of the test animals and is tabulated on the next slide:
Block Block 1 107 96 112 83 87 90 6 128 89 104 85 84 89 (1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)
2 102 72 100 82 70 94 7 56 70 72 64 62 63 (1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)
3 102 76 102 85 95 86 8 97 91 92 80 72 82 (1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)
4 93 70 93 63 71 63 9 80 63 87 82 81 63 (1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)
5 111 79 101 72 75 81 10 103 102 112 83 93 81 (1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)
Randomized Block Design
Example 2:
• The following experiment is interested in comparing the effect four different chemicals (A, B, C and D) in producing water resistance (y) in textiles.
• A strip of material, randomly selected from each bolt, is cut into four pieces (samples) the pieces are randomly assigned to receive one of the four chemical treatments.
• This process is replicated three times producing a Randomized Block (RB) design.
• Moisture resistance (y) were measured for each of the samples. (Low readings indicate low moisture penetration).
• The data is given in the diagram and table on the next slide.
Diagram: Blocks (Bolt Samples)
9.9 C 13.4 D 12.7 B 10.1 A 12.9 B 12.9 D 11.4 B 12.2 A 11.4 C 12.1 D 12.3 C 11.9 A
Table
Blocks (Bolt Samples)
Chemical 1 2 3
A 10.1 12.2 11.9
B 11.4 12.9 12.7
C 9.9 12.3 11.4
D 12.1 13.4 12.9
The Model for a randomized Block Experiment
ijjiijy
ijjiijy
i = 1,2,…, t j = 1,2,…, b
yij = the observation in the jth block receiving the ith treatment
= overall mean
i = the effect of the ith treatment
j = the effect of the jth Block
ij = random error
The Anova Table for a randomized Block Experiment
Source S.S. d.f. M.S. F p-value
Treat SST t-1 MST MST /MSE
Block SSB n-1 MSB MSB /MSE
Error SSE (t-1)(b-1) MSE
• A randomized block experiment is assumed to be a two-factor experiment.
• The factors are blocks and treatments.
• The is one observation per cell. It is assumed that there is no interaction between blocks and treatments.
• The degrees of freedom for the interaction is used to estimate error.
The Anova Table for Diet Experiment
Source S.S d.f. M.S. F p-valueBlock 5992.4167 9 665.82407 9.52 0.00000Diet 4572.8833 5 914.57667 13.076659 0.00000
ERROR 3147.2833 45 69.93963
The Anova Table forTextile Experiment
SOURCE SUM OF SQUARES D.F. MEAN SQUARE F TAIL PROB.Blocks 7.17167 2 3.5858 40.21 0.0003Chem 5.20000 3 1.7333 19.44 0.0017
ERROR 0.53500 6 0.0892
• If the treatments are defined in terms of two or more factors, the treatment Sum of Squares can be split (partitioned) into:
– Main Effects
– Interactions
The Anova Table for Diet Experiment terms for the main effects and interactions between Level of Protein and Source of Protein
Source S.S d.f. M.S. F p-valueBlock 5992.4167 9 665.82407 9.52 0.00000Diet 4572.8833 5 914.57667 13.076659 0.00000
ERROR 3147.2833 45 69.93963
Source S.S d.f. M.S. F p-valueBlock 5992.4167 9 665.82407 9.52 0.00000
Source 882.23333 2 441.11667 6.31 0.00380Level 2680.0167 1 2680.0167 38.32 0.00000
SL 1010.6333 2 505.31667 7.23 0.00190ERROR 3147.2833 45 69.93963
Repeated Measures Designs
In a Repeated Measures Design
We have experimental units that• may be grouped according to one or several
factors (the grouping factors)Then on each experimental unit we have• not a single measurement but a group of
measurements (the repeated measures)• The repeated measures may be taken at
combinations of levels of one or several factors (The repeated measures factors)
Example In the following study the experimenter was interested in how the level of a certain enzyme changed in cardiac patients after open heart surgery.
The enzyme was measured
• immediately after surgery (Day 0),
• one day (Day 1),
• two days (Day 2) and
• one week (Day 7) after surgery
for n = 15 cardiac surgical patients.
The data is given in the table below.
Subject Day 0 Day 1 Day 2 Day 7 Subject Day 0 Day 1 Day 2 Day 7 1 108 63 45 42 9 106 65 49 49 2 112 75 56 52 10 110 70 46 47 3 114 75 51 46 11 120 85 60 62 4 129 87 69 69 12 118 78 51 56 5 115 71 52 54 13 110 65 46 47 6 122 80 68 68 14 132 92 73 63 7 105 71 52 54 15 127 90 73 68 8 117 77 54 61
Table: The enzyme levels -immediately after surgery (Day 0), one day (Day 1),two days (Day 2) and one week (Day 7)
after surgery
• The subjects are not grouped (single group).
• There is one repeated measures factor -Time – with levels– Day 0, – Day 1, – Day 2, – Day 7
• This design is the same as a randomized block design with – Blocks = subjects
The Anova Table for Enzyme Experiment
Source SS df MS F p-valueSubject 4221.100 14 301.507 32.45 0.0000Day 36282.267 3 12094.089 1301.66 0.0000ERROR 390.233 42 9.291
The Subject Source of variability is modelling the variability between subjects
The ERROR Source of variability is modelling the variability within subjects
Example :
(Repeated Measures Design - Grouping Factor)
• In the following study, similar to example 3, the experimenter was interested in how the level of a certain enzyme changed in cardiac patients after open heart surgery.
• In addition the experimenter was interested in how two drug treatments (A and B) would also effect the level of the enzyme.
• The 24 patients were randomly divided into three groups of n= 8 patients.
• The first group of patients were left untreated as a control group while
• the second and third group were given drug treatments A and B respectively.
• Again the enzyme was measured immediately after surgery (Day 0), one day (Day 1), two days (Day 2) and one week (Day 7) after surgery for each of the cardiac surgical patients in the study.
Table: The enzyme levels - immediately after surgery (Day 0), one day (Day 1),two days (Day 2) and one week (Day 7)
after surgery for three treatment groups (control, Drug A, Drug B)
Group Control Drug A Drug B Day Day Day
0 1 2 7 0 1 2 7 0 1 2 7 122 87 68 58 93 56 36 37 86 46 30 31 112 75 55 48 78 51 33 34 100 67 50 50 129 80 66 64 109 73 58 49 122 97 80 72 115 71 54 52 104 75 57 60 101 58 45 43 126 89 70 71 108 71 57 65 112 78 67 66 118 81 62 60 116 76 58 58 106 74 54 54 115 73 56 49 108 64 54 47 90 59 43 38 112 67 53 44 110 80 63 62 110 76 64 58
• The subjects are grouped by treatment– control, – Drug A, – Drug B
• There is one repeated measures factor -Time – with levels– Day 0, – Day 1, – Day 2, – Day 7
The Anova Table
There are two sources of Error in a repeated measures design:
The between subject error – Error1 and
the within subject error – Error2
Source SS df MS F p-value
Drug 1745.396 2 872.698 1.78 0.1929
Error1
10287.844 21 489.897Time 47067.031 3 15689.010 1479.58 0.0000Time x Drug 357.688 6 59.615 5.62 0.0001
Error2
668.031 63 10.604
Tables of means
Drug Day 0 Day 1 Day 2 Day 7 Overall
Control 118.63 77.88 60.50 55.75 78.19
A 103.25 68.25 52.00 51.50 68.75
B 103.38 69.38 54.13 51.50 69.59
Overall 108.42 71.83 55.54 52.92 72.18
Time Profiles of Enzyme Levels
40
60
80
100
120
0 1 2 3 4 5 6 7Day
Enz
yme
Lev
el
Control
Drug A
Drug B