Stat 470-4 • Today: Multiple comparisons, diagnostic checking, an example • After these notes, we will have looked at 1.1-1.3 (skip figures 1.2 and 1.3, last two paragraphs of section 1.3), 1.6 (skip matrix notation and constraints), 1.7 (Tukey method only) and 1.9 (ignore H matrix notation on page 35), 2.1, 2.2 • We will not do 1.5 nor 1.8 • Assignment 1:
22
Embed
Stat 470-4 Today: Multiple comparisons, diagnostic checking, an example After these notes, we will have looked at 1.1-1.3 (skip figures 1.2 and 1.3, last.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Stat 470-4
• Today: Multiple comparisons, diagnostic checking, an example
• After these notes, we will have looked at 1.1-1.3 (skip figures 1.2 and 1.3, last two paragraphs of section 1.3), 1.6 (skip matrix notation and constraints), 1.7 (Tukey method only) and 1.9 (ignore H matrix notation on page 35), 2.1, 2.2
• We will not do 1.5 nor 1.8
• Assignment 1:
Multiple Comparisons
• In previous example, we saw that there was a significant treatment effect…so what?
• If an ANOVA is conducted and the analysis suggests that there is a significant treatment effect, then a reasonable question to ask is
Multiple Comparisons
• Would like to see if there is a difference between treatments i and j
• Can use two-sample t-test statistic to do this
• For testing reject if
• Perform many of these tests
jiAji HH : versus:0
Multiple Comparisons
• Perform many of these tests
• Error rate must be controlled
Tukey Method
• Tests:
• Confidence Interval:
Back to Example
Diagnostic Checking – Residual Analysis
• To support the assumptions on which the analysis is based, we need to check for – have all effects been
captured?
– unequal variances
– non-Normality
– sequence effects
• Should do this before hypothesis testing and multiple comparisons
T1
T2
T3
T4
3
4
5
6
7
8
Treatment
y
Dotplots of y by Treatmen(group means are indicated by lines)
The data plot (limited data) shows no strong evidence of non-Normality or unequal variances
Diagnostic Checking
• ANOVA model:
• Predicted response: , where–
–
• Residual:
• Estimates error
ijiijy
iiy ˆˆˆ
..ˆ y)(ˆ ... yyi
)ˆ( iijij yyr
Diagnostic Plots
• Errors are assumed to be normally distributed– Useful plot
• Errors assumed to be independent– Useful plot
• Equal variances in each group– Useful plot
Normality Check
• Dot plot or histogram of residuals
• Normal probability plot of residuals (via software or by hand - see class handout)
Normal Q-Q Plot of Residual for RESPONSE
Observed Value
.6.4.2-.0-.2-.4-.6
Exp
ect
ed
No
rma
l Va
lue
.6
.4
.2
-.0
-.2
-.4
-.6
Independence Check
• Plot residuals in the time sequence in which the data were collected
• X-axis denotes the sequence, Y-axis denotes the residual values
• Should observe
Independence Check
• Suppose the sequence of the observations (going across rows from top to bottom in the tabled data) is 1, 2, 11, 9, 5, 7, 6, 3, 4, 12, 10, 8
Time Plot of residuals
Sequence
14121086420
Re
sid
ua
l fo
r R
ES
PO
NS
E
.4
.2
-.0
-.2
-.4
-.6
Equal Variances
• A useful plot is:
• Should observe:
Equal Variances
Plot of Residual Versus Treatment
Packaging
5.04.03.02.01.00.0
Re
sid
ua
l fo
r R
ES
PO
NS
E
.4
.2
-.0
-.2
-.4
-.6
Comments
• The F-test is fairly robust – it is not very sensitive to departures from the assumption of Normal distributions.
• Often, simple transformations, such as the logarithm or square root, can make the Normal distribution assumption and the equal variance assumption more appropriate (Chapter 2)
• Method: Random assignment of treatments to experimental units
• ANOVA: Compare variation among treatments to variation within treatments to assess evidence of a difference among treatments
• Investigate and identify differences among Treatments, if any. Act on the findings
Comment: One-Way Model
• The one-way model,yij = + i + eij, eij ~NID(0, 2) can be and is applied to data obtained in ways other than a completely randomized design
• Example: starting salaries for MBAs at different companies. Company is not a treatment that is applied to experimental units
• Analyzing the data according to the above model can answer whether apparent differences between companies are real or could be just due to chance.
• The randomness involved comes from the randomness of the hiring and salary-determination processes, not the random assignment of treatments to experimental units
General Linear Model
• ANOVA model can be viewed as a special case of the general linear model or regression model
• Suppose have response, y, which is thought to be related to p predictors (sometimes called explanatory variables or regressors)
• Predictors: x1, x2,…,xp
• Model:
Example: Rainfall (Exercise 2.16)
• In winter, a plastic rain gauge cannot be used to collect precipitation because it will freeze and crack. Instead, metal cans are used to collect snowfall and the snow is allowed to melt indoors. The water is then poured into a plastic rain gauge and a measurement recorded. An estimate of snowfall is obtained by multiplying this measurement by 0.44.
• One observer questions this and decides to collect data to test the validity of this approach
• For each rainfall in a summer, she measures: (i) rainfall using a plastic rain gauge, (ii) using a metal can
• What is the current model being used?
Example: Rainfall (Exercise 2.16)
Scatter Plot of Rainfall Data
Rain Collected in Metal Can (x)
76543210
Ra
in C
olle
cte
d in
Pla
stic
Ga
ug
e4.0
3.0
2.0
1.0
0.0
Example: Rainfall (Exercise 2.16)
• Seems to be a linear relationship
• Will use regression to establish linear relationship between x and y