Top Banner
1 Chapter 13 Analysis of Variance
32

1 Chapter 13 Analysis of Variance. 2 Chapter Outline An introduction to experimental design and analysis of variance Analysis of Variance and the.

Jan 02, 2016

Download

Documents

Mervin Knight
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

1

Chapter 13Analysis of Variance

Page 2: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

2

Chapter Outline

An introduction to experimental design and analysis of variance

Analysis of Variance and the completely randomized design

Page 3: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

3

An Introduction to Experimental Design and Analysis and Variance

Statistical studies can be classified as being either experimental or observational.

In an experimental study, one or more factors are controlled so that data can be obtained about how the factors influence the variables of interest.

In an observational study, no attempt is made to control the factors.

Cause and effect relationship are easier to establish in experimental studies than in observational studies.

Analysis of variance (ANOVA) can be used to analyze the data obtained from experimental or observational studies.

Page 4: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

4

An Introduction to Experimental Design and Analysis and Variance

Three types of experimental designs are introduced. A completely randomly design A randomized block design A factorial experiment

Page 5: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

5

An Introduction to Experimental Design and Analysis and Variance

A factor is a variable that the experimenter has selected for investigation.

A treatment is a level of a factor For example, if location is a factor, then a treatment of

location can be New York, Chicago, or Seattle. Experimental units are the objects of interest in the

experiment. A completely randomized design is an

experimental design in which the treatments are randomly assigned to the experimental units.

Page 6: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

6

Analysis of Variance: A Conceptual Overview

Analysis of Variance (ANOVA) can be used to test for the equality of three or more population means.

Data obtained from observational or experimental studies can be used for the analysis.

We want to use the sample results to test the following hypothesis:

HH00: : 11==22==33==. . . . . . = = kk

HHaa: Not all population means are equal: Not all population means are equal

Page 7: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

7

Analysis of Variance: A Conceptual Overview

HH00: : 11==22==33==. . . . . . = = kk

HHaa: Not all population means are equal: Not all population means are equal

If If HH00 is rejected, we cannot conclude that is rejected, we cannot conclude that allall population means are different.population means are different. If If HH00 is rejected, we cannot conclude that is rejected, we cannot conclude that allall population means are different.population means are different.

Rejecting Rejecting HH00 means that means that at least twoat least two population population means have different values.means have different values. Rejecting Rejecting HH00 means that means that at least twoat least two population population means have different values.means have different values.

Page 8: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

8

Analysis of Variance: A Conceptual Overview

Assumptions for Analysis of VarianceAssumptions for Analysis of Variance

For each population, the response (dependent)For each population, the response (dependent) variable is variable is normally distributednormally distributed.. For each population, the response (dependent)For each population, the response (dependent) variable is variable is normally distributednormally distributed..

The variance of the response variable, denoted The variance of the response variable, denoted 22,, is the is the samesame for all of the populations. for all of the populations. The variance of the response variable, denoted The variance of the response variable, denoted 22,, is the is the samesame for all of the populations. for all of the populations.

The observations must be The observations must be independentindependent.. The observations must be The observations must be independentindependent..

Page 9: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

9

Analysis of Variance: A Conceptual Overview

Sampling Distribution of Given Sampling Distribution of Given HH00 is is TrueTruex

1x1x 3x3x2x2x

22x n

2

2x n

µ

If H0 is true, all the populations have the same mean. It is also assumed that all the populations have the same variance. Therefore, all the sample means are drawn from the same sampling distribution. As a result, the sample means tend to be close to one another.

Sample means are likely to be close to the same population mean if H0 is true.

Page 10: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

10

Analysis of Variance: A Conceptual Overview

Sampling Distribution of Given Sampling Distribution of Given HH00 is is FalseFalsex

33 1x1x 2x2x3x3x 11 22

When H0 is false, sample means are drawn from differentpopulations. As a result, sample means tend NOT to be

close together. Instead, they tend to be close to their own population means.

Page 11: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

11

Analysis of Variance

Between-treatments estimate of population variance

Within-treatments estimate of population variance Comparing the variance estimates: The F test ANOVA table

Page 12: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

12

Between-Treatments Estimate of Population Variance 22

The estimate of 2 2 based on the variation of the based on the variation of the sample means is called the mean square due to sample means is called the mean square due to treatments and is denoted by MSTR.treatments and is denoted by MSTR.

2

1

( )

MSTR1

k

j jj

n x x

k

2

1

( )

MSTR1

k

j jj

n x x

k

Denominator is theDenominator is thedegrees of freedomdegrees of freedom

associated with associated with SSTRSSTR

Numerator is calledNumerator is calledthe the sum of squares sum of squares

duedueto treatmentsto treatments

(SSTR)(SSTR)

Page 13: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

13

Between-Treatments Estimate of Population Variance 22

2

1

( )

MSTR1

k

j jj

n x x

k

2

1

( )

MSTR1

k

j jj

n x x

k

• k is the number of treatments (# of samples)

• is the number of observations in treatment j

• is the sample mean of treatment j

• is the overall mean, i.e. the average value of ALL the

observations from all the treatments

jn

jx

x

T

k

j

n

iij

n

x

x

j

1 1

Page 14: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

14

Within-Treatments Estimate of Population Variance 22

The estimate of 2 2 based on the variation of the based on the variation of the sample observations within each sample is called sample observations within each sample is called the mean square due to error and is denoted by the mean square due to error and is denoted by MSE.MSE.

kn

sn

MSET

k

jjj

1

21

Denominator is Denominator is thethe

degrees of degrees of freedomfreedom

associated with associated with SSESSE

Numerator is Numerator is calledcalled

the the sum of sum of squaressquares

due to errordue to error (SSE)(SSE)

Page 15: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

15

Within-Treatments Estimate of Population Variance 22

• k is the number of treatments (# of samples)

• is the number of observations in treatment j

• is the sample variance of treatment j

• nT is the total number of ALL the observations from all the

treatments

jn

kn

sn

MSET

k

jjj

1

21

2js

k

jjT nn

1

Page 16: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

16

Comparing the Variance Estimates: The F Test

Because the within-treatments estimate (MSE) of 22 only involves sample variances, all of which are unbiased estimates of the population variance (according to the assumptions, all the population variances are the same), MSE is a good estimate of population variance regardless whether H0 is true or not.

On the other hand, the between-treatments estimate (MSTR), which uses sample means, will be a good estimate of 22 if H0 is true, since all the sample means are drawn from the same population when H0 is true.

When H0 is false, the sample means are drawn from different populations (with different µ). Therefore, MSTR will overestimate 2 2 since the sample means will not be since the sample means will not be close together. close together.

Page 17: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

17

Comparing the Variance Estimates: The F Test

If the null hypothesis is true and the ANOVA assumptions are valid, the sampling distribution of MSTR/MSE is an F distribution with MSTR degrees of freedom (d.f.) equal to k -1 and MSE d.f. equal to nT - k.

If H0 is true, MSTR/MSE should be close to 1 since both are good estimates of 22..

If H0 is false, i.e. if the means of the k populations are not equal, the ratio MSTR/MSE will be larger than 1 since MSTR overestimates 2 .2 .

Hence, we will reject H0 if the value of MSTR/MSE proves to be too large to have been resulted at random from the appropriate F distribution.

Page 18: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

18

Comparing the Variance Estimates: The F Test

Sampling Distribution of MSTR/MSE

Do Not Reject H0Do Not Reject H0

Reject H0Reject H0

MSTR/MSEMSTR/MSE

Critical ValueCritical ValueFF

Sampling DistributionSampling Distributionof MSTR/MSEof MSTR/MSE

Page 19: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

19

ANOVA Table

MSTRSSTR

-

k 1MSTR

SSTR-

k 1

MSESSE

-

n kT

MSESSE

-

n kT

MSTRMSE

MSTRMSE

Source ofSource ofVariationVariation

Sum ofSum ofSquaresSquares

Degrees ofDegrees ofFreedomFreedom

MeanMeanSquareSquare FF

TreatmentsTreatments

ErrorError

TotalTotal

kk - 1 - 1

nnTT - 1 - 1

SSTRSSTR

SSESSE

SSTSST

nnT T - - kk

SST is SST is partitionedpartitioned

into SSTR and into SSTR and SSE.SSE.

SST’s degrees of SST’s degrees of freedomfreedom

(d.f.) are partitioned (d.f.) are partitioned intointo

SSTR’s d.f. and SSE’s SSTR’s d.f. and SSE’s d.f.d.f.

pp--ValueValue

Page 20: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

20

ANOVA Table

SST divided by its degrees of freedom SST divided by its degrees of freedom nnTT – 1 is the – 1 is the overall sample variance that would be obtained if weoverall sample variance that would be obtained if we treated the entire set of observations as one data set.treated the entire set of observations as one data set.

SST divided by its degrees of freedom SST divided by its degrees of freedom nnTT – 1 is the – 1 is the overall sample variance that would be obtained if weoverall sample variance that would be obtained if we treated the entire set of observations as one data set.treated the entire set of observations as one data set.

With the entire data set as one sample, the formulaWith the entire data set as one sample, the formula for computing the total sum of squares, SST, is:for computing the total sum of squares, SST, is:

With the entire data set as one sample, the formulaWith the entire data set as one sample, the formula for computing the total sum of squares, SST, is:for computing the total sum of squares, SST, is:

2

1 1

SST ( ) SSTR SSEjnk

ijj i

x x

2

1 1

SST ( ) SSTR SSEjnk

ijj i

x x

Page 21: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

21

ANOVA Table

ANOVA can be viewed as the process of partitioningANOVA can be viewed as the process of partitioning the total sum of squares and the degrees of freedomthe total sum of squares and the degrees of freedom into their corresponding sources: treatments and error.into their corresponding sources: treatments and error.

ANOVA can be viewed as the process of partitioningANOVA can be viewed as the process of partitioning the total sum of squares and the degrees of freedomthe total sum of squares and the degrees of freedom into their corresponding sources: treatments and error.into their corresponding sources: treatments and error. Dividing the sum of squares by the appropriateDividing the sum of squares by the appropriate degrees of freedom provides the variance estimates.degrees of freedom provides the variance estimates.

Dividing the sum of squares by the appropriateDividing the sum of squares by the appropriate degrees of freedom provides the variance estimates.degrees of freedom provides the variance estimates.

The The FF value (MSTR/MSE) is used to test the hypothesis value (MSTR/MSE) is used to test the hypothesis of equal population means.of equal population means. The The FF value (MSTR/MSE) is used to test the hypothesis value (MSTR/MSE) is used to test the hypothesis of equal population means.of equal population means.

Page 22: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

22

Test for the Equality of k Population Means

HypothesesHypotheses

Test StatisticTest Statistic

H0: 1=2=3=. . . = k

Ha: Not all population means are equal

F = MSTR/MSE

Page 23: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

23

Test for the Equality of k Population Means

Rejection RuleRejection Rule

pp-value Approach:-value Approach: Reject H0 if p-value <

Critical Value Approach:Critical Value Approach: Reject H0 if F > F

where the value of where the value of FF is based on an is based on anFF distribution with distribution with kk - 1 numerator d.f. - 1 numerator d.f.and and nnTT - - kk denominator d.f. denominator d.f.

Page 24: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

24

Test for the Equality of k Population Means:An Observational Study

Example: Reed ManufacturingExample: Reed Manufacturing

Janet Reed would like to know if there is Janet Reed would like to know if there is anyanysignificant difference in the mean number of significant difference in the mean number of hourshoursworked per week for the department worked per week for the department managers atmanagers ather three manufacturing plants (in Buffalo,her three manufacturing plants (in Buffalo,Pittsburgh, and Detroit). An Pittsburgh, and Detroit). An FF test will be test will be conductedconductedusing using = .05. = .05.

Page 25: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

25

Test for the Equality of k Population Means:An Observational Study

Example: Reed ManufacturingExample: Reed Manufacturing

A simple random sample of five managers A simple random sample of five managers fromfromeach of the three plants was taken and the each of the three plants was taken and the number ofnumber ofhours worked by each manager in the previous hours worked by each manager in the previous weekweekis shown on the next slide.is shown on the next slide.

Factor . . . Manufacturing plantTreatments . . . Buffalo, Pittsburgh, DetroitExperimental units . . . Managers

Response variable . . . Number of hours worked

Page 26: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

26

Test for the Equality of k Population Means:An Observational Study

1122334455

48485454575754546262

73736363666664647474

51516363616154545656

Plant 1Plant 1BuffaloBuffalo

Plant 2Plant 2PittsburghPittsburgh

Plant 3Plant 3DetroitDetroitObservationObservation

Sample MeanSample MeanSample VarianceSample Variance

5555 68 68 57 5726.026.0 26.5 26.5 24.5 24.5

Page 27: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

27

Test for the Equality of k Population Means:An Observational Study

H0: 1=2=3

Ha: Not all the means are equalwhere: 1 = mean number of hours worked per

week by the managers at Plant 1 2 = mean number of hours worked per week by the managers at Plant 23 = mean number of hours worked per week by the managers at Plant 3

1. Develop the hypotheses.1. Develop the hypotheses.

Page 28: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

28

Test for the Equality of k Population Means:An Observational Study

2. Specify the level of significance. = .05= .05

3. Compute the value of the test statistic.

MSTR = 490/(3 - 1) = 245

SSTR = 5(55 - 60)2 + 5(68 - 60)2 + 5(57 - 60)2 = 490

= (55 + 68 + 57)/3 = 60

(Only when sample sizes are all equal, the overall mean is equal to the average of sample means.)

Mean Square Due to Treatments Due to Treatments

3/321 xxxx

Page 29: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

29

Test for the Equality of k Population Means:An Observational Study

3. Compute the value of the test statistic.

MSE = 308/(15 - 3) = 25.667

SSE = 4(26.0) + 4(26.5) + 4(24.5) = 308Mean Square Due to Error

(con’t.)

F = MSTR/MSE = 245/25.667 = 9.55

Page 30: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

30

Test for the Equality of k Population Means:An Observational Study

TreatmentTreatment

ErrorError

TotalTotal

490490

308308

798798

22

1212

1414

245245

25.66725.667

Source ofSource ofVariationVariation

Sum ofSum ofSquaresSquares

Degrees ofDegrees ofFreedomFreedom

MeanMeanSquareSquare

9.559.55

FF

ANOVA Table

pp-Value-Value

.0033.0033

Page 31: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

31

Test for the Equality of k Population Means:An Observational Study

p – Value Approach

5. Determine whether to reject H0.

We have sufficient evidence to conclude that the mean number of hours worked per week by department managers is not the same at all 3 plants.

The p-value < .05, so we reject H0.

With 2 numerator d.f. and 12 denominator d.f.,the p-value is .0033 for F = 9.55.

4. Compute the p –value.

Page 32: 1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.

32

Test for the Equality of k Population Means:An Observational Study

Critical Value Approach

5. Determine whether to reject H0.

Because F = 9.55 > 3.89, we reject H0.

4. Determine the critical value and rejection rule.

Reject H0 if F > 3.89 (critical value)

We have sufficient evidence to conclude that the mean number of hours worked per week by department managers is not the same at all 3 plants.

Based on an F distribution with 2 numeratord.f. and 12 denominator d.f., F.05 = 3.89.