Stat 470-4 Today: Multiple comparisons, diagnostic checking, an example After these notes, we will have looked at 1.1-1.3 (skip figures 1.2 and 1.3, last.

Stat 470-4

• Today: Multiple comparisons, diagnostic checking, an example

• After these notes, we will have looked at 1.1-1.3 (skip figures 1.2 and 1.3, last two paragraphs of section 1.3), 1.6 (skip matrix notation and constraints), 1.7 (Tukey method only) and 1.9 (ignore H matrix notation on page 35), 2.1, 2.2

• We will not do 1.5 nor 1.8

• Assignment 1:

Multiple Comparisons

• In previous example, we saw that there was a significant treatment effect…so what?

• If an ANOVA is conducted and the analysis suggests that there is a significant treatment effect, then a reasonable question to ask is


• Would like to see if there is a difference between treatments i and j

• Can use two-sample t-test statistic to do this

• For testing reject if

• Perform many of these tests

jiAji HH : versus:0


• Perform many of these tests

• Error rate must be controlled

Tukey Method

• Tests:

• Confidence Interval:

Back to Example

Diagnostic Checking – Residual Analysis

• To support the assumptions on which the analysis is based, we need to check for – have all effects been

captured?

– unequal variances

– non-Normality

– sequence effects

• Should do this before hypothesis testing and multiple comparisons

T1

T2

T3

T4

3

4

5

6

7

8

Treatment

y

Dotplots of y by Treatmen(group means are indicated by lines)

The data plot (limited data) shows no strong evidence of non-Normality or unequal variances

Diagnostic Checking

• ANOVA model:

• Predicted response: , where–

–

• Residual:

• Estimates error

ijiijy

iiy ˆˆˆ

..ˆ y)(ˆ ... yyi

)ˆ( iijij yyr

Diagnostic Plots

• Errors are assumed to be normally distributed– Useful plot

• Errors assumed to be independent– Useful plot

• Equal variances in each group– Useful plot

Normality Check

• Dot plot or histogram of residuals

• Normal probability plot of residuals (via software or by hand - see class handout)

Normal Q-Q Plot of Residual for RESPONSE

Observed Value

.6.4.2-.0-.2-.4-.6

Exp

ect

ed

No

rma

l Va

lue

.6

.4

.2

-.0

-.2

-.4

-.6

Independence Check

• Plot residuals in the time sequence in which the data were collected

• X-axis denotes the sequence, Y-axis denotes the residual values

• Should observe

Independence Check

• Suppose the sequence of the observations (going across rows from top to bottom in the tabled data) is 1, 2, 11, 9, 5, 7, 6, 3, 4, 12, 10, 8

Time Plot of residuals

Sequence

14121086420

Re

sid

ua

l fo

r R

ES

PO

NS

E

.4

.2

-.0

-.2

-.4

-.6

Equal Variances

• A useful plot is:

• Should observe:

Equal Variances

Plot of Residual Versus Treatment

Packaging

5.04.03.02.01.00.0

Re

sid

ua

l fo

r R

ES

PO

NS

E

.4

.2

-.0

-.2

-.4

-.6

Comments

• The F-test is fairly robust – it is not very sensitive to departures from the assumption of Normal distributions.

• Often, simple transformations, such as the logarithm or square root, can make the Normal distribution assumption and the equal variance assumption more appropriate (Chapter 2)

Summary: Completely Randomized Design, One-Way ANOVA

• Method: Random assignment of treatments to experimental units

• ANOVA: Compare variation among treatments to variation within treatments to assess evidence of a difference among treatments

• Investigate and identify differences among Treatments, if any. Act on the findings

Comment: One-Way Model

• The one-way model,yij = + i + eij, eij ~NID(0, 2) can be and is applied to data obtained in ways other than a completely randomized design

• Example: starting salaries for MBAs at different companies. Company is not a treatment that is applied to experimental units

• Analyzing the data according to the above model can answer whether apparent differences between companies are real or could be just due to chance.

• The randomness involved comes from the randomness of the hiring and salary-determination processes, not the random assignment of treatments to experimental units

General Linear Model

• ANOVA model can be viewed as a special case of the general linear model or regression model

• Suppose have response, y, which is thought to be related to p predictors (sometimes called explanatory variables or regressors)

• Predictors: x1, x2,…,xp

• Model:

Example: Rainfall (Exercise 2.16)

• In winter, a plastic rain gauge cannot be used to collect precipitation because it will freeze and crack. Instead, metal cans are used to collect snowfall and the snow is allowed to melt indoors. The water is then poured into a plastic rain gauge and a measurement recorded. An estimate of snowfall is obtained by multiplying this measurement by 0.44.

• One observer questions this and decides to collect data to test the validity of this approach

• For each rainfall in a summer, she measures: (i) rainfall using a plastic rain gauge, (ii) using a metal can

• What is the current model being used?


Scatter Plot of Rainfall Data

Rain Collected in Metal Can (x)

76543210

Ra

in C

olle

cte

d in

Pla

stic

Ga

ug

e4.0

3.0

2.0

1.0

0.0


• Seems to be a linear relationship

• Will use regression to establish linear relationship between x and y

• What should the slope be?


Coefficientsa

3.579E-02 .012 2.931 .005

.444 .006 .995 76.264 .000

(Constant)

X

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Ya.

ANOVAb

25.860 1 25.860 5816.213 .000a

.245 55 .004

26.105 56

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Xa.

Dependent Variable: Yb.

Model Summaryb

.995a .991 .990 .06668Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Xa.

Dependent Variable: Yb.

Stat 470-4 Today: Multiple comparisons, diagnostic checking, an example After these notes, we will have looked at 1.1-1.3 (skip figures 1.2 and 1.3, last.

Documents