Top Banner
Copyright (c) Bani K. Mal lick 1 STAT 651 Lecture #13
44

Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 1

STAT 651

Lecture #13

Page 2: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 2

Topics in Lecture #13 Multiple comparisons, especially

Fisher’s Least Significant Difference

Residuals as a means of checking the normality assumption

Page 3: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 3

Book Sections Covered in Lecture #13

Chapter 8.4 (Residuals)

Chapter 9.4 (Fisher’s)

Chapter 9.1 (the idea of multiple comparisons)

Page 4: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 4

Lecture 12 Review: ANOVA

Suppose we form three populations on the basis of body mass index (BMI):

BMI < 22, 22 <= BMI < 28, BMI > 28

This forms 3 populations

We want to know whether the three populations have the same mean caloric intake, or if their food composition differs.

Page 5: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 5

Lecture 12 Review: ANOVA

One procedure that is often followed is to do a preliminary test to see whether there are any differences among the populations

Then, once you conclude that some differences exist, you allow somewhat more informality in deciding where those differences manifest themselves

The first step is the ANOVA F-test

Page 6: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 6

Lecture 12 Review: ANOVA

The distance of the data to the overall mean is

TSS = (Corrected) Total Sum of Squares

This has degrees of freedom

2ij

ij

TSS = (Y Y )

Tn 1

Page 7: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 7

Lecture 12 Review: ANOVA

The sum of squares between groups Corrected Model) is

It has t-1 degrees of freedom, so the number of populations is the degrees of freedom between groups + 1.

2ii

i

n (Y Y )

Page 8: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 8

Lecture 12 Review: ANOVA

The distance of the observations to their sample means is

This is the Sum of Squares for Error

It has degrees of freedom

2iij

ij

SSE = (Y Y )

Tn t

Page 9: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 9

Lecture 12 Review: ANOVA

Next comes the F-statistic

It is the ratio of the mean square for the corrected model to the mean square for error

Large values indicate rejection of the null hypothesis Tests of Between-Subjects Effects

Dependent Variable: Baseline FFQ

960.287a 2 480.143 5.689 .004

196009.919 1 196009.919 2322.508 .000

960.287 2 480.143 5.689 .004

15275.639 181 84.396

226223.216 184

16235.925 183

SourceCorrected Model

Intercept

BMIGROUP

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .059 (Adjusted R Squared = .049)a.

Page 10: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 10

Lecture 12 Review: ANOVA

The F-statistic is compared to the F-distribution with t-1 and degrees of freedom.

See Table 8 ,which lists the cutoff points in terms of . If the F-statistic exceeds the cutoff, you reject the hypothesis of equality of all the means.

SPSS gives you the p-value (significance level) for this test

Tn t

Page 11: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 11

Lecture 12 Review: ANOVA

The F-statistic is compared to the F-distribution with df1 = t-1 and degrees of freedom.

For example if you have 3 populations, 6 observations for each population, then there are 18 total observations.

The degrees of freedom are 2 and 15. If you want a type I error of 5%, look at df1 = 2, df2 = 15, = .05 to get a critical value of 3.68: try this out!

2 Tdf =n t

Page 12: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 12

Lecture 12 Review: ANOVA

If the populations have a common variance 2, the Mean squared error estimates it.

You take the square root of the MSE to estimate

Tests of Between-Subjects Effects

Dependent Variable: Baseline FFQ

960.287a 2 480.143 5.689 .004

196009.919 1 196009.919 2322.508 .000

960.287 2 480.143 5.689 .004

15275.639 181 84.396

226223.216 184

16235.925 183

SourceCorrected Model

Intercept

BMIGROUP

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .059 (Adjusted R Squared = .049)a.

Page 13: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 13

Lecture 12 Review: ANOVA

The critical value of 2 and 181 df for an F-test at Type I error 0.05 is about 3.05

Hence F > 3.05, so the p-value is < 0.05

Tests of Between-Subjects Effects

Dependent Variable: Baseline FFQ

960.287a 2 480.143 5.689 .004

196009.919 1 196009.919 2322.508 .000

960.287 2 480.143 5.689 .004

15275.639 181 84.396

226223.216 184

16235.925 183

SourceCorrected Model

Intercept

BMIGROUP

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .059 (Adjusted R Squared = .049)a.

Page 14: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 14

ANOVA in SPSS

“Analyze”, “General Linear Model”, “Univariate”

“Fixed factor” = the variable defining the populations

Always “Save” unstandardized residuals

“Posthoc”: Move factor to right and click on LSD

Page 15: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 15

ANOVA Table

Tests of Between-Subjects Effects

Dependent Variable: Baseline FFQ

960.287a 2 480.143 5.689 .004

196009.919 1 196009.919 2322.508 .000

960.287 2 480.143 5.689 .004

15275.639 181 84.396

226223.216 184

16235.925 183

SourceCorrected Model

Intercept

BMIGROUP

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .059 (Adjusted R Squared = .049)a.

Page 16: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 16

Fisher’s Least Significant Distance (LSD)

Suppose that we determine that there are at least some differences among t population means.

Fisher’s Least Significant Difference is one way to tell which ones are different

The main reason to use it is convenience: all comparisons can be done with the click of a mouse

It does not guarantee longer or shorter confidence intervals

Page 17: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 17

Fisher’s Least Significant Distance (LSD)

For example, suppose there are t = 3 populations.

The null hypothesis is

The alternative is:

But this does not tell you which populations are different, only that some are

0 1 2 3Η :μ =μ =μ

0H : null hypothesis is false

Page 18: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 18

Fisher’s Least Significant Distance (LSD)

The null hypothesis is

The alternative is:

There are 4 possibilities:

Fishers LSD is a way of getting this directly

0 1 2 3H :μ =μ =μ

0H : null hypothesis is false

1 2 3

1 3 2

2 3 1

1 2 3

Page 19: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 19

Fisher’s LSD

We have done an ANOVA, and now we want to compare two specific populations.

Fisher’s LSD differs from our usual 2-population comparisons in two features:

The degrees of freedom (nT-t) not n1+n2-2

The pooled standard deviation (square

root of MSE = SSE/(nT-t) , not sP

Page 20: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 20

Review: Comparing Two Populations

If you can reasonably believe that the population sd’s are nearly equal, it is customary to pick the equal variance assumption and estimate the common standard deviation by

2 21 1 2 2

p1 2

(n 1)s (n 1)ss

n +n 2

Page 21: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 21

Comparing Two Populations: Usual and Fisher LSD

1 2X X /2 1 1 2

2 pt (n +n -2n

)1 1

sn

21 2

1 2Tn t

1 1MSE

n nt

Usual

Fisher

Page 22: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 22

ROS Data

ROS data has three groups: Fish oil diet, Fish-like oil diet, and Corn Oil

We want to compare their responses to butyrate

Between-Subjects Factors

FAEE oildiet

10

Fish oil diet 10

Corn oil diet 10

1.00

2.00

3.00

DietGroup

Value Label N

Page 23: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 23

ANOVA

ROS data, log scale. What do you see?

101010N =

ROS Response After Butyrate Exposure

Diet Group

Corn oil dietFish oil dietFAEE oil diet

log

(Bu

tyra

te)

- lo

g(C

on

tro

l)

2.0

1.5

1.0

.5

0.0

-.5

24

Page 24: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 24

ANOVA

ROS data, log scale. What do you see? Maybe different variances, but sample sizes are small

101010N =

ROS Response After Butyrate Exposure

Diet Group

Corn oil dietFish oil dietFAEE oil diet

log

(Bu

tyra

te)

- lo

g(C

on

tro

l)

2.0

1.5

1.0

.5

0.0

-.5

24

Page 25: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 25

ANOVA

ROS data, log scale. No major changes in means?

101010N =

ROS Response After Butyrate Exposure

Diet Group

Corn oil dietFish oil dietFAEE oil diet

log

(Bu

tyra

te)

- lo

g(C

on

tro

l)

2.0

1.5

1.0

.5

0.0

-.5

24

Page 26: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 26

ANOVA

ROS data has three groups: Fish oil diet, Fish-like oil diet, and Corn Oil

What was the total sample size? n = 30Tests of Between-Subjects Effects

Dependent Variable: log(Butyrate) - log(Control)

5.188E-02a 2 2.594E-02 .203 .818

5.957 1 5.957 46.542 .000

5.188E-02 2 2.594E-02 .203 .818

3.456 27 .128

9.465 30

3.508 29

SourceCorrected Model

Intercept

DIETGRP

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .015 (Adjusted R Squared = -.058)a.

Page 27: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 27

ANOVA

ROS data: any evidence that the population means are different in their change after butyrate exposure?

Tests of Between-Subjects Effects

Dependent Variable: log(Butyrate) - log(Control)

5.188E-02a 2 2.594E-02 .203 .818

5.957 1 5.957 46.542 .000

5.188E-02 2 2.594E-02 .203 .818

3.456 27 .128

9.465 30

3.508 29

SourceCorrected Model

Intercept

DIETGRP

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .015 (Adjusted R Squared = -.058)a.

Page 28: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 28

ANOVA

ROS data: any evidence that the population means are different in their change after butyrate exposure? No, the p-value is 0.818!

This matches the box plotsTests of Between-Subjects Effects

Dependent Variable: log(Butyrate) - log(Control)

5.188E-02a 2 2.594E-02 .203 .818

5.957 1 5.957 46.542 .000

5.188E-02 2 2.594E-02 .203 .818

3.456 27 .128

9.465 30

3.508 29

SourceCorrected Model

Intercept

DIETGRP

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .015 (Adjusted R Squared = -.058)a.

Page 29: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 29

ROS Data

Testing for Normality in ANOVA

I use the General Linear Model to define these residuals

Form the residuals, which are simply the differences of the data with their group sample mean

Then do a q-q plot

Useful if you have many groups with a small number of observations per group

Page 30: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 30

ANOVA

Here is the Q-Q plot. How’s it look?

ROS: log scale

Observed Value

1.0.50.0-.5-1.0

Exp

ect

ed

No

rma

l Va

lue

.8

.6

.4

.2

0.0

-.2

-.4

-.6

-.8

Page 31: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 31

ROS Data

Testing for Normality in ANOVA:

Illustrate saving residuals: “general linear model”, “univariate”, “save” (select “unstandardized” to create the residual variable )

Illustrate q-q- plot on residuals

Illustrate editing a chart object to change titles and the like

Page 32: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 32

ROS Data Fisher’s LSD. Note how all p-values are

> 0.10.

Multiple Comparisons

Dependent Variable: log(Butyrate) - log(Control)

LSD

6.825E-02 .1600 .673 -.2600 .3965

9.960E-02 .1600 .539 -.2287 .4279

-6.8255E-02 .1600 .673 -.3965 .2600

3.135E-02 .1600 .846 -.2969 .3596

-9.9605E-02 .1600 .539 -.4279 .2287

-3.1350E-02 .1600 .846 -.3596 .2969

(J) Diet GroupFish oil diet

Corn oil diet

FAEE oil diet

Corn oil diet

FAEE oil diet

Fish oil diet

(I) Diet GroupFAEE oil diet

Fish oil diet

Corn oil diet

MeanDifference

(I-J) Std. Error

Pvalues

Sig. Lower Bound Upper Bound

95% Confidence Interval

Based on observed means.

Page 33: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 33

ROS Data: Compare Fish to Corn oil

Mean for fish – mean for corn =

Multiple Comparisons

Dependent Variable: log(Butyrate) - log(Control)

LSD

6.825E-02 .1600 .673 -.2600 .3965

9.960E-02 .1600 .539 -.2287 .4279

-6.8255E-02 .1600 .673 -.3965 .2600

3.135E-02 .1600 .846 -.2969 .3596

-9.9605E-02 .1600 .539 -.4279 .2287

-3.1350E-02 .1600 .846 -.3596 .2969

(J) Diet GroupFish oil diet

Corn oil diet

FAEE oil diet

Corn oil diet

FAEE oil diet

Fish oil diet

(I) Diet GroupFAEE oil diet

Fish oil diet

Corn oil diet

MeanDifference

(I-J) Std. Error

Pvalues

Sig. Lower Bound Upper Bound

95% Confidence Interval

Based on observed means.

Page 34: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 34

ROS Data: Compare Fish to Corn oil

Mean for fish – mean for corn = 0.03135

Standard error =

Multiple Comparisons

Dependent Variable: log(Butyrate) - log(Control)

LSD

6.825E-02 .1600 .673 -.2600 .3965

9.960E-02 .1600 .539 -.2287 .4279

-6.8255E-02 .1600 .673 -.3965 .2600

3.135E-02 .1600 .846 -.2969 .3596

-9.9605E-02 .1600 .539 -.4279 .2287

-3.1350E-02 .1600 .846 -.3596 .2969

(J) Diet GroupFish oil diet

Corn oil diet

FAEE oil diet

Corn oil diet

FAEE oil diet

Fish oil diet

(I) Diet GroupFAEE oil diet

Fish oil diet

Corn oil diet

MeanDifference

(I-J) Std. Error

Pvalues

Sig. Lower Bound Upper Bound

95% Confidence Interval

Based on observed means.

Page 35: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 35

ROS Data: Compare Fish to Corn oil

Mean for fish – mean for corn = 0.03135

Standard error = 0.1600

CI (95%) = Multiple Comparisons

Dependent Variable: log(Butyrate) - log(Control)

LSD

6.825E-02 .1600 .673 -.2600 .3965

9.960E-02 .1600 .539 -.2287 .4279

-6.8255E-02 .1600 .673 -.3965 .2600

3.135E-02 .1600 .846 -.2969 .3596

-9.9605E-02 .1600 .539 -.4279 .2287

-3.1350E-02 .1600 .846 -.3596 .2969

(J) Diet GroupFish oil diet

Corn oil diet

FAEE oil diet

Corn oil diet

FAEE oil diet

Fish oil diet

(I) Diet GroupFAEE oil diet

Fish oil diet

Corn oil diet

MeanDifference

(I-J) Std. Error

Pvalues

Sig. Lower Bound Upper Bound

95% Confidence Interval

Based on observed means.

Page 36: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 36

ROS Data: Compare Fish to Corn oil

Mean for fish – mean for corn = 0.03135

Standard error = 0.1600

CI (95%) = -2969 to .3596Multiple Comparisons

Dependent Variable: log(Butyrate) - log(Control)

LSD

6.825E-02 .1600 .673 -.2600 .3965

9.960E-02 .1600 .539 -.2287 .4279

-6.8255E-02 .1600 .673 -.3965 .2600

3.135E-02 .1600 .846 -.2969 .3596

-9.9605E-02 .1600 .539 -.4279 .2287

-3.1350E-02 .1600 .846 -.3596 .2969

(J) Diet GroupFish oil diet

Corn oil diet

FAEE oil diet

Corn oil diet

FAEE oil diet

Fish oil diet

(I) Diet GroupFAEE oil diet

Fish oil diet

Corn oil diet

MeanDifference

(I-J) Std. Error

Pvalues

Sig. Lower Bound Upper Bound

95% Confidence Interval

Based on observed means.

Page 37: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 37

Concho Water Snake Illustration

A numerical example will help illustrate this idea. I’ll consider comparing tail lengths of female Concho Water Snakes with age classes 2,3, and 4.

Sample sizes

Sample sd:

Sample means:

1 2 3 Tn 11,n 17,n 9,n 37.

1 2 3s 17.90,s 10.95,s 13.58.

1 2 3153.82, 173.24, 194.67.

Page 38: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 38

Female Concho Water Snakes, Ages 2-4, Tail Length

Between-Subjects Factors

11

17

9

2.00

3.00

4.00

AgeN

Page 39: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 39

Female Concho Water Snakes, Ages 2-4, Tail Length

91711N =

Age

4.003.002.00

Ta

il L

en

gth

220

200

180

160

140

120

35

27

Page 40: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 40

Female Concho Water Snakes, Ages 2-4, Tail Length: are they different in population means?

Tests of Between-Subjects Effects

Dependent Variable: Tail Length

8269.413a 2 4134.706 21.304 .000

1043505.649 1 1043505.649 5376.698 .000

8269.413 2 4134.706 21.304 .000

6598.695 34 194.079

1118093.000 37

14868.108 36

SourceCorrected Model

Intercept

AGE

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .556 (Adjusted R Squared = .530)a.

Page 41: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 41

Concho Water Snake Example

Multiple Comparisons

Dependent Variable: Tail Length

LSD

-19.4171 * 5.3907 .001 -30.3724 -8.4618

-40.8485 * 6.2616 .000 -53.5736 -28.1233

19.4171 * 5.3907 .001 8.4618 30.3724

-21.4314 * 5.7429 .001 -33.1023 -9.7604

40.8485 * 6.2616 .000 28.1233 53.5736

21.4314 * 5.7429 .001 9.7604 33.1023

(J) Age3.00

4.00

2.00

4.00

2.00

3.00

(I) Age2.00

3.00

4.00

MeanDifference

(I-J) Std. Error Sig. Lower Bound Upper Bound

95% Confidence Interval

Based on observed means.

The mean difference is significant at the .05 level.*.

Page 42: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 42

Concho Water Snake Illustration: Hand Calculations

Sample size factor for comparing the age groups

Sample mean difference

2 3

1 10.41

n n

43.2123

Page 43: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 43

Concho Water Snake Illustration

nT – t = 34 degrees of freedom for error

MSE = 194.08,

= 0.05

= 9.76 to 33.10: compare with output

MSE 13.93 2 Tn t 2.03t

3 2 2 T2 3

1 1n t MSE

n nt

43.2123

Page 44: Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.

Copyright (c) Bani K. Mallick 44

Female Concho Water Snakes, Ages 2-4, Tail Length

Normal Q-Q Plot of Residual for TAILL

Observed Value

3020100-10-20-30-40

Exp

ect

ed

No

rma

l Va

lue

30

20

10

0

-10

-20

-30

We need a method that allows for non-normal data!