Ch.7 ANOVA Outline 1. One-Way Analysis of Variance (a) Using PROC GLM and PROC ANOVA (b) Using PROC NPAR1WAY (c) Post-Hoc Comparisons for One-Way ANOVA (d) Computing Contrasts 2. Two-Way Analysis of Variance 3. Interpreting Significant Interactions 4. N-way Factorial Designs 5. Analysis of Covariance This material covers sections 7BCDEFGH and 6D. 1
80
Embed
Ch.7 ANOVA Outline 1.One-Way Analysis of Variance · Ch.7 ANOVA Outline 1.One-Way Analysis of Variance (a) ... *The p-value for our test is .0195 so we reject the null ... One-Way
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Ch.7 ANOVA
Outline
1. One-Way Analysis of Variance
(a) Using PROC GLM and PROC ANOVA
(b) Using PROC NPAR1WAY
(c) Post-Hoc Comparisons for One-Way ANOVA
(d) Computing Contrasts
2. Two-Way Analysis of Variance
3. Interpreting Significant Interactions
4. N-way Factorial Designs
5. Analysis of Covariance
This material covers sections 7BCDEFGH and 6D.
1
PROC GLM
• PROC GLM (General Linear Models) is the general regres-
sion and ANOVA procedure in SAS. It is appropriate for any
univariate analysis, and there is a facility for MANOVA
(multivariate analysis of variance).
• Although PROC GLM can be used for multiple linear regres-
sion, PROC REG is more efficient for this purpose.
• PROC ANOVA is more computationally efficient for analysis
of variance problems involving balanced designs (equal
sample sizes for each treatment group).
• For general ANOVA and ANCOVA (analysis of covari-
ance) problems, PROC GLM must be used.
2
One-Way Analysis of Variance
Unbalanced Case – PROC GLM
• To compare the means for more than two independent,
normally distributed samples of unequal size, PROC GLM
should be used.
• Example: The distance (in m) required to stop a car
going 50 km/hr on wet pavement was measured several
times for each of three brands of tires to compare the
traction of each brand. The same vehicle was used
for each measurement. The resulting distances were
recorded in a file called tract.txt.
3
tract.txt
BRAND DISTANCE BRAND DISTANCE
M 41 M 46
M 43 M 40
M 44 M 42
M 44 B 49
B 44 B 46
B 43 B 44
G 39 G 42
G 41 G 44
G 40 G 43
G 41
4
Read data
* To read this data into a data set called TRACT, use
FILENAME TRACTION ’TRACT.TXT’;
DATA TRACT;
INFILE TRACTION;
INPUT BRAND $ DISTANCE;
RUN;
* We wish to compare the stopping distance of the three
brands. In other words, we wish to know whether mean
DISTANCE depends on BRAND.
5
Plot
* Plot the data first. The plot may indicate that DISTANCE
does not depend upon BRAND. It may also indicate de-
partures from the model assumptions – look for outliers
and indications of nonconstant variance.
PROC PLOT;
PLOT DISTANCE*BRAND;
RUN;
* The plot gives some indication that the stopping dis-
tance distributions are not all the same.
6
Calculating means
* The means for each brand will be different. We can
calculate them, noting that BRAND is a CLASSification
variable (or factor) and DISTANCE is a response (or de-
pendent) variable.
PROC MEANS MEAN;
VAR DISTANCE;
CLASS BRAND;
RUN;
7
Analysis of variance
* The analysis of variance will help us to decide whether
the observed differences among the three brands are
significant. We must use PROC GLM, because the sample
sizes for the different brands are unequal.
PROC GLM DATA=TRACT;
CLASS BRAND;
MODEL DISTANCE=BRAND;
RUN; QUIT;
8
Output
* The first page of output is a summary of the levels
of the classification variable and the total number of
experimental units in the study.
* The second page of output contains the analysis of
variance table.
* The numerator and denominator degrees of freedom for
the F-ratio are given in the DF column:
** No. of treatment groups = 3, so numerator DF =
3-1=2.
** Total No. of observations = 19, so denominator DF
= 19-3= 16.
9
The sums of squares and mean squares:
* The sums of squares:
** SSW = SSE = Sum of Squares Error = 65.4 When
divided by its degrees of freedom, this summarizes
the variability observed within each treatment group.
** SSB = SSModel = Sum of Squares Model = 41.6
When divided by its degrees of freedom, this summa-
rizes the variability observed between each treatment
mean.
* The mean squares:
** MSE = SSE/DF = 4.09
** MSModel = SSModel/DF = 20.8
10
Discussion and Conclusion
* The F ratio is the statistic used for testing the hy-
pothesis that the mean DISTANCE does not differ for the
different brands. F = MSModel/MSE = 5.09.
* The p-value = P(F > F), if the true population means
are actually equal. A small p-value implies strong evi-
dence against this hypothesis.
* The p-value for our test is .0195 so we reject the null
hypothesis at the 5% level; we have strong evidence
that the mean distance depends on brand.
11
One-Way Analysis of Variance
Balanced Case – PROC ANOVA
• To compare the means of more than two independent,
normally distributed samples of equal size, PROC ANOVA
should be used.
• Example: The file thiamin.txt contains measurements
of thiamin content for 6 samples of 4 different cereal
grains.
12
@@ symbol for INPUT
* To save space, more than one observation has been
stored in each record. To read in such data, use the @@
• Check for interaction effects between WWT and WEANTIME.
If there is a significant effect here, then this means
that the covariate is not independent of the levels of
the factor.
PROC GLM;
CLASS WEANTIME;
MODEL NWWT = WEANTIME WWT WEANTIME*WWT;
RUN;
* The large p-value (.519) indicates that the interaction
effect is not significant.
73
Pigs data ANCOVA output
Source DF Type I SS Mean Square F Value Pr > FWEANTIME 2 77.2380952 38.6190476 3.23 0.0681WWT 1 394.0805861 394.0805861 32.98 <.0001WWT*WEANTIME 2 16.3788118 8.1894059 0.69 0.5190
74
ANCOVA
We can proceed to check for main effects:
PROC GLM;
CLASS WEANTIME;
MODEL NWWT = WEANTIME WWT;
LSMEANS WEANTIME / PDIFF ADJUST = TUKEY;
RUN; QUIT;
From the ANCOVA output, we see strong evidence that
the covariate WWT is related to NWWT, and that there is mod-
erate evidence to suggest that WEANTIME is related to NWWT
(p-value = .0591).
75
Pigs data ANCOVA output
Source DF Type I SS Mean Square F Value Pr > FWEANTIME 2 77.2380952 38.6190476 3.36 0.0591WWT 1 394.0805861 394.0805861 34.24 <.0001
76
Pigs data ANCOVA output
Least Squares Means for effect WEANTIME
Pr > |t| for H0: LSMean(i)=LSMean(j)
Dependent Variable: NWWT
i/j 1 2 3
1 0.0005 0.3462
2 0.0005 0.0029
3 0.3462 0.0029
77
Using Proc Reg for ANCOVA
ANCOVA can also be done with PROC REG. In order to do
this, one must construct ‘dummy’ variables.
DATA PIGS2;
SET PIGS;
IF WEANTIME = ’EARLY’ THEN DO;
WT1 = 0;
WT2 = 1;
END;
IF WEANTIME = ’MEDIUM’ THEN DO;
WT1 = 1;
WT2 = 0;
END;
78
Using Proc Reg for ANCOVA Cont’d
IF WEANTIME = ’LATE’ THEN DO;
WT1 = 1;
WT2 = 1;
END;
PROC REG;
MODEL NWWT = WWT WT1 WT2;
RUN; QUIT;
The output is in the form of a regression model relating
NWWT to WWT, WT1 and WT2.
79
Summary
• The analysis of variance is used to test for differences
in the means among 3 or more populations.
• PROC GLM must be used if the samples coming from the
different populations or treatment groups are of differ-
ent sizes. Otherwise, PROC ANOVA can be used.
• If the data are clearly not normally distributed, and the
treatment groups are defined by the different levels of
a single factor (i.e. a 1-way layout), then PROC NPAR1WAY