Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Post on 21-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Copyright (c) Bani K. Mallick 1

STAT 651

Lecture #13

Copyright (c) Bani K. Mallick 2

Topics in Lecture #13 Multiple comparisons, especially

Fisher’s Least Significant Difference

Residuals as a means of checking the normality assumption

Copyright (c) Bani K. Mallick 3

Book Sections Covered in Lecture #13

Chapter 8.4 (Residuals)

Chapter 9.4 (Fisher’s)

Chapter 9.1 (the idea of multiple comparisons)

Copyright (c) Bani K. Mallick 4

Lecture 12 Review: ANOVA

Suppose we form three populations on the basis of body mass index (BMI):

BMI < 22, 22 <= BMI < 28, BMI > 28

This forms 3 populations

We want to know whether the three populations have the same mean caloric intake, or if their food composition differs.

Copyright (c) Bani K. Mallick 5

Lecture 12 Review: ANOVA

One procedure that is often followed is to do a preliminary test to see whether there are any differences among the populations

Then, once you conclude that some differences exist, you allow somewhat more informality in deciding where those differences manifest themselves

The first step is the ANOVA F-test

Copyright (c) Bani K. Mallick 6

Lecture 12 Review: ANOVA

The distance of the data to the overall mean is

TSS = (Corrected) Total Sum of Squares

This has degrees of freedom

2ij

ij

TSS = (Y Y )

Tn 1

Copyright (c) Bani K. Mallick 7

Lecture 12 Review: ANOVA

The sum of squares between groups Corrected Model) is

It has t-1 degrees of freedom, so the number of populations is the degrees of freedom between groups + 1.

2ii

i

n (Y Y )

Copyright (c) Bani K. Mallick 8

Lecture 12 Review: ANOVA

The distance of the observations to their sample means is

This is the Sum of Squares for Error

It has degrees of freedom

2iij

ij

SSE = (Y Y )

Tn t

Copyright (c) Bani K. Mallick 9

Lecture 12 Review: ANOVA

Next comes the F-statistic

It is the ratio of the mean square for the corrected model to the mean square for error

Large values indicate rejection of the null hypothesis Tests of Between-Subjects Effects

Dependent Variable: Baseline FFQ

960.287a 2 480.143 5.689 .004

196009.919 1 196009.919 2322.508 .000

960.287 2 480.143 5.689 .004

15275.639 181 84.396

226223.216 184

16235.925 183

SourceCorrected Model

Intercept

BMIGROUP

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .059 (Adjusted R Squared = .049)a.

Copyright (c) Bani K. Mallick 10

Lecture 12 Review: ANOVA

The F-statistic is compared to the F-distribution with t-1 and degrees of freedom.

See Table 8 ,which lists the cutoff points in terms of . If the F-statistic exceeds the cutoff, you reject the hypothesis of equality of all the means.

SPSS gives you the p-value (significance level) for this test

Tn t

Copyright (c) Bani K. Mallick 11

Lecture 12 Review: ANOVA

The F-statistic is compared to the F-distribution with df1 = t-1 and degrees of freedom.

For example if you have 3 populations, 6 observations for each population, then there are 18 total observations.

The degrees of freedom are 2 and 15. If you want a type I error of 5%, look at df1 = 2, df2 = 15, = .05 to get a critical value of 3.68: try this out!

2 Tdf =n t

Copyright (c) Bani K. Mallick 12

Lecture 12 Review: ANOVA

If the populations have a common variance 2, the Mean squared error estimates it.

You take the square root of the MSE to estimate

Tests of Between-Subjects Effects

Dependent Variable: Baseline FFQ

960.287a 2 480.143 5.689 .004

196009.919 1 196009.919 2322.508 .000

960.287 2 480.143 5.689 .004

15275.639 181 84.396

226223.216 184

16235.925 183

SourceCorrected Model

Intercept

BMIGROUP

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .059 (Adjusted R Squared = .049)a.

Copyright (c) Bani K. Mallick 13

Lecture 12 Review: ANOVA

The critical value of 2 and 181 df for an F-test at Type I error 0.05 is about 3.05

Hence F > 3.05, so the p-value is < 0.05

Tests of Between-Subjects Effects

Dependent Variable: Baseline FFQ

960.287a 2 480.143 5.689 .004

196009.919 1 196009.919 2322.508 .000

960.287 2 480.143 5.689 .004

15275.639 181 84.396

226223.216 184

16235.925 183

SourceCorrected Model

Intercept

BMIGROUP

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .059 (Adjusted R Squared = .049)a.

Copyright (c) Bani K. Mallick 14

ANOVA in SPSS

“Analyze”, “General Linear Model”, “Univariate”

“Fixed factor” = the variable defining the populations

Always “Save” unstandardized residuals

“Posthoc”: Move factor to right and click on LSD

Copyright (c) Bani K. Mallick 15

ANOVA Table

Tests of Between-Subjects Effects

Dependent Variable: Baseline FFQ

960.287a 2 480.143 5.689 .004

196009.919 1 196009.919 2322.508 .000

960.287 2 480.143 5.689 .004

15275.639 181 84.396

226223.216 184

16235.925 183

SourceCorrected Model

Intercept

BMIGROUP

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .059 (Adjusted R Squared = .049)a.

Copyright (c) Bani K. Mallick 16

Fisher’s Least Significant Distance (LSD)

Suppose that we determine that there are at least some differences among t population means.

Fisher’s Least Significant Difference is one way to tell which ones are different

The main reason to use it is convenience: all comparisons can be done with the click of a mouse

It does not guarantee longer or shorter confidence intervals

Copyright (c) Bani K. Mallick 17

Fisher’s Least Significant Distance (LSD)

For example, suppose there are t = 3 populations.

The null hypothesis is

The alternative is:

But this does not tell you which populations are different, only that some are

0 1 2 3Η :μ =μ =μ

0H : null hypothesis is false

Copyright (c) Bani K. Mallick 18

Fisher’s Least Significant Distance (LSD)

The null hypothesis is

The alternative is:

There are 4 possibilities:

Fishers LSD is a way of getting this directly

0 1 2 3H :μ =μ =μ

0H : null hypothesis is false

1 2 3

1 3 2

2 3 1

1 2 3

Copyright (c) Bani K. Mallick 19

Fisher’s LSD

We have done an ANOVA, and now we want to compare two specific populations.

Fisher’s LSD differs from our usual 2-population comparisons in two features:

The degrees of freedom (nT-t) not n1+n2-2

The pooled standard deviation (square

root of MSE = SSE/(nT-t) , not sP

Copyright (c) Bani K. Mallick 20

Review: Comparing Two Populations

If you can reasonably believe that the population sd’s are nearly equal, it is customary to pick the equal variance assumption and estimate the common standard deviation by

2 21 1 2 2

p1 2

(n 1)s (n 1)ss

n +n 2

Copyright (c) Bani K. Mallick 21

Comparing Two Populations: Usual and Fisher LSD

1 2X X /2 1 1 2

2 pt (n +n -2n

)1 1

sn

21 2

1 2Tn t

1 1MSE

n nt

Usual

Fisher

Copyright (c) Bani K. Mallick 22

ROS Data

ROS data has three groups: Fish oil diet, Fish-like oil diet, and Corn Oil

We want to compare their responses to butyrate

Between-Subjects Factors

FAEE oildiet

10

Fish oil diet 10

Corn oil diet 10

1.00

2.00

3.00

DietGroup

Value Label N

Copyright (c) Bani K. Mallick 23

ANOVA

ROS data, log scale. What do you see?

101010N =

ROS Response After Butyrate Exposure

Diet Group

Corn oil dietFish oil dietFAEE oil diet

log

(Bu

tyra

te)

- lo

g(C

on

tro

l)

2.0

1.5

1.0

.5

0.0

-.5

24

Copyright (c) Bani K. Mallick 24

ANOVA

ROS data, log scale. What do you see? Maybe different variances, but sample sizes are small

101010N =

ROS Response After Butyrate Exposure

Diet Group

Corn oil dietFish oil dietFAEE oil diet

log

(Bu

tyra

te)

- lo

g(C

on

tro

l)

2.0

1.5

1.0

.5

0.0

-.5

24

Copyright (c) Bani K. Mallick 25

ANOVA

ROS data, log scale. No major changes in means?

101010N =

ROS Response After Butyrate Exposure

Diet Group

Corn oil dietFish oil dietFAEE oil diet

log

(Bu

tyra

te)

- lo

g(C

on

tro

l)

2.0

1.5

1.0

.5

0.0

-.5

24

Copyright (c) Bani K. Mallick 26

ANOVA

ROS data has three groups: Fish oil diet, Fish-like oil diet, and Corn Oil

What was the total sample size? n = 30Tests of Between-Subjects Effects

Dependent Variable: log(Butyrate) - log(Control)

5.188E-02a 2 2.594E-02 .203 .818

5.957 1 5.957 46.542 .000

5.188E-02 2 2.594E-02 .203 .818

3.456 27 .128

9.465 30

3.508 29

SourceCorrected Model

Intercept

DIETGRP

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .015 (Adjusted R Squared = -.058)a.

Copyright (c) Bani K. Mallick 27

ANOVA

ROS data: any evidence that the population means are different in their change after butyrate exposure?

Tests of Between-Subjects Effects

Dependent Variable: log(Butyrate) - log(Control)

5.188E-02a 2 2.594E-02 .203 .818

5.957 1 5.957 46.542 .000

5.188E-02 2 2.594E-02 .203 .818

3.456 27 .128

9.465 30

3.508 29

SourceCorrected Model

Intercept

DIETGRP

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .015 (Adjusted R Squared = -.058)a.

Copyright (c) Bani K. Mallick 28

ANOVA

ROS data: any evidence that the population means are different in their change after butyrate exposure? No, the p-value is 0.818!

This matches the box plotsTests of Between-Subjects Effects

Dependent Variable: log(Butyrate) - log(Control)

5.188E-02a 2 2.594E-02 .203 .818

5.957 1 5.957 46.542 .000

5.188E-02 2 2.594E-02 .203 .818

3.456 27 .128

9.465 30

3.508 29

SourceCorrected Model

Intercept

DIETGRP

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .015 (Adjusted R Squared = -.058)a.

Copyright (c) Bani K. Mallick 29

ROS Data

Testing for Normality in ANOVA

I use the General Linear Model to define these residuals

Form the residuals, which are simply the differences of the data with their group sample mean

Then do a q-q plot

Useful if you have many groups with a small number of observations per group

Copyright (c) Bani K. Mallick 30

ANOVA

Here is the Q-Q plot. How’s it look?

ROS: log scale

Observed Value

1.0.50.0-.5-1.0

Exp

ect

ed

No

rma

l Va

lue

.8

.6

.4

.2

0.0

-.2

-.4

-.6

-.8

Copyright (c) Bani K. Mallick 31

ROS Data

Testing for Normality in ANOVA:

Illustrate saving residuals: “general linear model”, “univariate”, “save” (select “unstandardized” to create the residual variable )

Illustrate q-q- plot on residuals

Illustrate editing a chart object to change titles and the like

Copyright (c) Bani K. Mallick 32

ROS Data Fisher’s LSD. Note how all p-values are

> 0.10.

Multiple Comparisons

Dependent Variable: log(Butyrate) - log(Control)

LSD

6.825E-02 .1600 .673 -.2600 .3965

9.960E-02 .1600 .539 -.2287 .4279

-6.8255E-02 .1600 .673 -.3965 .2600

3.135E-02 .1600 .846 -.2969 .3596

-9.9605E-02 .1600 .539 -.4279 .2287

-3.1350E-02 .1600 .846 -.3596 .2969

(J) Diet GroupFish oil diet

Corn oil diet

FAEE oil diet

Corn oil diet

FAEE oil diet

Fish oil diet

(I) Diet GroupFAEE oil diet

Fish oil diet

Corn oil diet

MeanDifference

(I-J) Std. Error

Pvalues

Sig. Lower Bound Upper Bound

95% Confidence Interval

Based on observed means.

Copyright (c) Bani K. Mallick 33

ROS Data: Compare Fish to Corn oil

Mean for fish – mean for corn =

Multiple Comparisons

Dependent Variable: log(Butyrate) - log(Control)

LSD

6.825E-02 .1600 .673 -.2600 .3965

9.960E-02 .1600 .539 -.2287 .4279

-6.8255E-02 .1600 .673 -.3965 .2600

3.135E-02 .1600 .846 -.2969 .3596

-9.9605E-02 .1600 .539 -.4279 .2287

-3.1350E-02 .1600 .846 -.3596 .2969

(J) Diet GroupFish oil diet

Corn oil diet

FAEE oil diet

Corn oil diet

FAEE oil diet

Fish oil diet

(I) Diet GroupFAEE oil diet

Fish oil diet

Corn oil diet

MeanDifference

(I-J) Std. Error

Pvalues

Sig. Lower Bound Upper Bound

95% Confidence Interval

Based on observed means.

Copyright (c) Bani K. Mallick 34

ROS Data: Compare Fish to Corn oil

Mean for fish – mean for corn = 0.03135

Standard error =

Multiple Comparisons

Dependent Variable: log(Butyrate) - log(Control)

LSD

6.825E-02 .1600 .673 -.2600 .3965

9.960E-02 .1600 .539 -.2287 .4279

-6.8255E-02 .1600 .673 -.3965 .2600

3.135E-02 .1600 .846 -.2969 .3596

-9.9605E-02 .1600 .539 -.4279 .2287

-3.1350E-02 .1600 .846 -.3596 .2969

(J) Diet GroupFish oil diet

Corn oil diet

FAEE oil diet

Corn oil diet

FAEE oil diet

Fish oil diet

(I) Diet GroupFAEE oil diet

Fish oil diet

Corn oil diet

MeanDifference

(I-J) Std. Error

Pvalues

Sig. Lower Bound Upper Bound

95% Confidence Interval

Based on observed means.

Copyright (c) Bani K. Mallick 35

ROS Data: Compare Fish to Corn oil

Mean for fish – mean for corn = 0.03135

Standard error = 0.1600

CI (95%) = Multiple Comparisons

Dependent Variable: log(Butyrate) - log(Control)

LSD

6.825E-02 .1600 .673 -.2600 .3965

9.960E-02 .1600 .539 -.2287 .4279

-6.8255E-02 .1600 .673 -.3965 .2600

3.135E-02 .1600 .846 -.2969 .3596

-9.9605E-02 .1600 .539 -.4279 .2287

-3.1350E-02 .1600 .846 -.3596 .2969

(J) Diet GroupFish oil diet

Corn oil diet

FAEE oil diet

Corn oil diet

FAEE oil diet

Fish oil diet

(I) Diet GroupFAEE oil diet

Fish oil diet

Corn oil diet

MeanDifference

(I-J) Std. Error

Pvalues

Sig. Lower Bound Upper Bound

95% Confidence Interval

Based on observed means.

Copyright (c) Bani K. Mallick 36

ROS Data: Compare Fish to Corn oil

Mean for fish – mean for corn = 0.03135

Standard error = 0.1600

CI (95%) = -2969 to .3596Multiple Comparisons

Dependent Variable: log(Butyrate) - log(Control)

LSD

6.825E-02 .1600 .673 -.2600 .3965

9.960E-02 .1600 .539 -.2287 .4279

-6.8255E-02 .1600 .673 -.3965 .2600

3.135E-02 .1600 .846 -.2969 .3596

-9.9605E-02 .1600 .539 -.4279 .2287

-3.1350E-02 .1600 .846 -.3596 .2969

(J) Diet GroupFish oil diet

Corn oil diet

FAEE oil diet

Corn oil diet

FAEE oil diet

Fish oil diet

(I) Diet GroupFAEE oil diet

Fish oil diet

Corn oil diet

MeanDifference

(I-J) Std. Error

Pvalues

Sig. Lower Bound Upper Bound

95% Confidence Interval

Based on observed means.

Copyright (c) Bani K. Mallick 37

Concho Water Snake Illustration

A numerical example will help illustrate this idea. I’ll consider comparing tail lengths of female Concho Water Snakes with age classes 2,3, and 4.

Sample sizes

Sample sd:

Sample means:

1 2 3 Tn 11,n 17,n 9,n 37.

1 2 3s 17.90,s 10.95,s 13.58.

1 2 3153.82, 173.24, 194.67.

Copyright (c) Bani K. Mallick 38

Female Concho Water Snakes, Ages 2-4, Tail Length

Between-Subjects Factors

11

17

9

2.00

3.00

4.00

AgeN

Copyright (c) Bani K. Mallick 39

Female Concho Water Snakes, Ages 2-4, Tail Length

91711N =

Age

4.003.002.00

Ta

il L

en

gth

220

200

180

160

140

120

35

27

Copyright (c) Bani K. Mallick 40

Female Concho Water Snakes, Ages 2-4, Tail Length: are they different in population means?

Tests of Between-Subjects Effects

Dependent Variable: Tail Length

8269.413a 2 4134.706 21.304 .000

1043505.649 1 1043505.649 5376.698 .000

8269.413 2 4134.706 21.304 .000

6598.695 34 194.079

1118093.000 37

14868.108 36

SourceCorrected Model

Intercept

AGE

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .556 (Adjusted R Squared = .530)a.

Copyright (c) Bani K. Mallick 41

Concho Water Snake Example

Multiple Comparisons

Dependent Variable: Tail Length

LSD

-19.4171 * 5.3907 .001 -30.3724 -8.4618

-40.8485 * 6.2616 .000 -53.5736 -28.1233

19.4171 * 5.3907 .001 8.4618 30.3724

-21.4314 * 5.7429 .001 -33.1023 -9.7604

40.8485 * 6.2616 .000 28.1233 53.5736

21.4314 * 5.7429 .001 9.7604 33.1023

(J) Age3.00

4.00

2.00

4.00

2.00

3.00

(I) Age2.00

3.00

4.00

MeanDifference

(I-J) Std. Error Sig. Lower Bound Upper Bound

95% Confidence Interval

Based on observed means.

The mean difference is significant at the .05 level.*.

Copyright (c) Bani K. Mallick 42

Concho Water Snake Illustration: Hand Calculations

Sample size factor for comparing the age groups

Sample mean difference

2 3

1 10.41

n n

43.2123

Copyright (c) Bani K. Mallick 43

Concho Water Snake Illustration

nT – t = 34 degrees of freedom for error

MSE = 194.08,

= 0.05

= 9.76 to 33.10: compare with output

MSE 13.93 2 Tn t 2.03t

3 2 2 T2 3

1 1n t MSE

n nt

43.2123

Copyright (c) Bani K. Mallick 44

Female Concho Water Snakes, Ages 2-4, Tail Length

Normal Q-Q Plot of Residual for TAILL

Observed Value

3020100-10-20-30-40

Exp

ect

ed

No

rma

l Va

lue

30

20

10

0

-10

-20

-30

We need a method that allows for non-normal data!

top related