Top Banner
©[email protected] 2012 Inferential Statistics, T Test, ANOVA & Proportionate Test Assoc. Prof . Dr Azmi Mohd Tamil Dept of Community Health Universiti Kebangsaan Malaysia FF2613
117

T test and ANOVA

May 07, 2015

Download

Health & Medicine

Azmi Mohd Tamil

TTest and ANOVA
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: T test and ANOVA

©[email protected] 2012

Inferential Statistics, T Test, ANOVA & Proportionate Test

Assoc. Prof . Dr Azmi Mohd Tamil

Dept of Community Health

Universiti Kebangsaan Malaysia

FF2613

Page 2: T test and ANOVA

©[email protected] 2012

Inferential Statistics

Basic Hypothesis Testing

FF2613

Page 3: T test and ANOVA

©[email protected] 2012

Inferential Statistic

4When we conduct a study, we want to

make an inference from the data

collected. For example;

“drug A is better than drug B in treating

disease D"

Page 4: T test and ANOVA

©[email protected] 2012

Drug A Better Than Drug B?

4Drug A has a higher rate of cure than

drug B. (Cured/Not Cured)

4 If for controlling BP, the mean of BP

drop for drug A is larger than drug B.

(continuous data – mm Hg)

Page 5: T test and ANOVA

©[email protected] 2012

Null Hypothesis

4Null Hyphotesis;

“no difference of effectiveness between

drug A and drug B in treating disease D"

Page 6: T test and ANOVA

©[email protected] 2012

Null Hypothesis

4H0 is assumed TRUE unless data indicate

otherwise:

• The experiment is trying to reject the null

hypothesis

• Can reject, but cannot prove, a hypothesis

– e.g. “all swans are white”

» One black swan suffices to reject

» H0 “Not all swans are white”

» No number of white swans can prove the hypothesis –

since the next swan could still be black.

Page 7: T test and ANOVA

©[email protected] 2012

Can reindeer fly?

4 You believe reindeer can fly

4 Null hypothesis: “reindeer cannot fly”

4 Experimental design: to throw reindeer off the roof

4 Implementation: they all go splat on the ground

4 Evaluation: null hypothesis not rejected• This does not prove reindeer cannot fly: what you have

shown is that

– “from this roof, on this day, under these weather conditions, these particular reindeer either could not, or chose not to, fly”

4 It is possible, in principle, to reject the null hypothesis• By exhibiting a flying reindeer!

Page 8: T test and ANOVA

©[email protected] 2012

Significance

4 Inferential statistics determine whether a significant difference of effectiveness exist between drug A and drug B.

4 If there is a significant difference (p<0.05), then the null hypothesis would be rejected.

4 Otherwise, if no significant difference (p>0.05), then the null hypothesis would not be rejected.

4 The usual level of significance utilised to reject or not reject the null hypothesis are either 0.05 or 0.01. In the above example, it was set at 0.05.

Page 9: T test and ANOVA

©[email protected] 2012

Confidence interval

4Confidence interval = 1 - level of significance.

4 If the level of significance is 0.05, then

the confidence interval is 95%.

4CI = 1 – 0.05 = 0.95 = 95%

4If CI = 99%, then level of significance is 0.01.

Page 10: T test and ANOVA

©[email protected] 2012

What is level of significance? Chance?

What is level of significance? Chance?

t0 2.0639-2.0639

.025

Reject H0

Reject H0

.025

t0 2.0639-2.0639

.025

Reject H0

Reject H0

.025

-1.96 1.96

Page 11: T test and ANOVA

©[email protected] 2012

Fisher’s Use of p-Values

4 R.A. Fisher referred to the probability to declare significance as “p-value”.

4 “It is a common practice to judge a result significant, if it is of such magnitude that it would be produced by chance not more frequently than once in 20 trials.”

4 1/20=0.05. If p-value less than 0.05, then the probability of the effect detected were due to chance is less than 5%.

4 We would be 95% confident that the effect detected is due to real effect, not due to chance.

Page 12: T test and ANOVA

©[email protected] 2012

Error

4Although we have determined the level

of significance and confidence interval,

there is still a chance of error.

4There are 2 types;

• Type I Error

• Type II Error

Page 13: T test and ANOVA

©[email protected] 2012

Treatments are

not different

Treatments are

different

Conclude

treatments are

not different

Conclude

treatments are

different

DECISION

REALITY

Correct DecisionType I error

α error

Type II error

β error

Correct Decision

(Cell b)(Cell a)

(Cell c) (Cell d)

Error

Page 14: T test and ANOVA

©[email protected] 2012

Error

Test of Correct Null Hypothesis

Incorrect Null

Hypothesis

Significance (Ho not rejected) (Ho rejected)

Null Hypothesis

Not Rejected Correct Conclusion Type II Error

Null Hypothesis

Rejected Type I Error Correct Conclusion

Page 15: T test and ANOVA

©[email protected] 2012

Type I Error

• Type I Error – rejecting the null hypothesis

although the null hypothesis is correct

e.g.

• when we compare the mean/proportion of

the 2 groups, the difference is small but the

difference is found to be significant.

Therefore the null hypothesis is rejected.

• It may occur due to inappropriate choice of

alpha (level of significance).

Page 16: T test and ANOVA

©[email protected] 2012

Type II Error

• Type II Error – not rejecting the null

hypothesis although the null hypothesis is

wrong

• e.g. when we compare the mean/proportion

of the 2 groups, the difference is big but the

difference is not significant. Therefore the

null hypothesis is not rejected.

• It may occur when the sample size is

too small.

Page 17: T test and ANOVA

©[email protected] 2012

Type of treatment * Pain (2 hrs post-op) Crosstabulation

8 7 15

53.3% 46.7% 100.0%

4 11 15

26.7% 73.3% 100.0%

12 18 30

40.0% 60.0% 100.0%

Count

% within Type

of treatment

Count

% within Type

of treatment

Count

% within Type

of treatment

Pethidine

Cocktail

Type of treatment

Total

No pain In pain

Pain (2 hrs post-op)

Total

Example of Type II Error

Data of a clinical trial on 30 patients on comparison of pain control between

two modes of treatment.

p = 0.136. p bigger than 0.05. No significant difference and the null hypothesis was not

rejected.

There was a large difference between the rates but were not

significant. Type II Error?

Chi-square =2.222, p=0.136

Page 18: T test and ANOVA

©[email protected] 2012

Not significant since power of the study is less than 80%.

Power is only

32%!

Page 19: T test and ANOVA

©[email protected] 2012

Check for the errors

4You can check for type II errors of your

own data analysis by checking for the

power of the respective analysis

4This can easily be done by utilising

software such as Power & Sample Size

(PS2) from the website of the Vanderbilt

University

Page 20: T test and ANOVA

©[email protected] 2012

Determining the appropriate statistical test

Page 21: T test and ANOVA

©[email protected] 2012

Data Analysis

4Descriptive – summarising data

4Test of Association

4Multivariate – controlling for confounders

Page 22: T test and ANOVA

©[email protected] 2012

Test of Association

4To study the relationship between one

or more risk variable(s) (independent)

with outcome variable (dependent)

4For example; does ethnicity affects the

suicidal/para-suicidal tendencies of

psychiatric patients.

Page 23: T test and ANOVA

©[email protected] 2012

Problem Flow Chart

Marital Status

Suicidal Tendencies

Ethnicity

Independent Variables

Dependent Variable

Page 24: T test and ANOVA

©[email protected] 2012

Multivariat

4Studies the association between

multiple causative factors/variables

(independent variables) with the

outcome (dependent).

4For example; risk factors such as

parental care, practise of religion,

education level of parents & disciplinary

problems of their child (outcome).

Page 25: T test and ANOVA

©[email protected] 2012

Hypothesis TestingHypothesis Testing

4Distinguish parametric & non-parametric

procedures

4Test two or more populations using

parametric & non-parametric procedures

• Means

• Medians

• Variances

Page 26: T test and ANOVA

©[email protected] 2012

Hypothesis Testing Procedures

Hypothesis Testing Procedures

Page 27: T test and ANOVA

©[email protected] 2012

Parametric Test Procedures

Parametric Test Procedures

4 Involve population parameters

• Example: Population mean

4Require interval scale or ratio scale

• Whole numbers or fractions

• Example: Height in inches: 72, 60.5, 54.7

4Have stringent assumptions

• Example: Normal distribution

4Examples: Z test, t test

Page 28: T test and ANOVA

©[email protected] 2012

Nonparametric Test Procedures

Nonparametric Test Procedures

4Statistic does not depend on population

distribution

4Data may be nominally or ordinally

scaled

• Example: Male-female

4May involve population parameters such

as median

4Example: Wilcoxon rank sum test

Page 29: T test and ANOVA

©[email protected] 2012

Parametric Analysis –Quantitative

Qualitative

Dichotomus

Quantitative Normally distributed data Student's t Test

Qualitative

Polinomial

Quantitative Normally distributed data ANOVA

Quantitative Quantitative Repeated measurement of the

same individual & item (e.g.

Hb level before & after

treatment). Normally

distributed data

Paired t Test

Quantitative -

continous

Quantitative -

continous

Normally distributed data Pearson Correlation

& Linear

Regresssion

Page 30: T test and ANOVA

©[email protected] 2012

non-parametric tests

Variable 1 Variable 2 Criteria Type of Test

Qualitative

Dichotomus

Qualitative

Dichotomus

Sample size < 20 or (< 40 but

with at least one expected

value < 5)

Fisher Test

Qualitative

Dichotomus

Quantitative Data not normally distributed Wilcoxon Rank Sum

Test or U Mann-

Whitney Test

Qualitative

Polinomial

Quantitative Data not normally distributed Kruskal-Wallis One

Way ANOVA Test

Quantitative Quantitative Repeated measurement of the

same individual & item

Wilcoxon Rank Sign

TestQuantitative -

continous

Quantitative -

continous

Data not normally distributed Spearman/Kendall

Rank Correlation

Page 31: T test and ANOVA

©[email protected] 2012

Statistical Tests - Qualitative

Variable 1 Variable 2 Criteria Type of Test

Qualitative Qualitative Sample size > 20 dan no

expected value < 5Chi Square Test (X

2)

Qualitative

Dichotomus

Qualitative

Dichotomus

Sample size > 30 Proportionate Test

Qualitative

Dichotomus

Qualitative

Dichotomus

Sample size > 40 but with at

least one expected value < 5X2 Test with Yates

Correction

Qualitative Quantitative Normally distributed data Student's t TestQualitative

Dichotomus

Qualitative

Dichotomus

Sample size < 20 or (< 40 but

with at least one expected

value < 5)

Fisher Test

Qualitative Quantitative Data not normally distributed Wilcoxon Rank Sum

Page 32: T test and ANOVA

©[email protected] 2012

Data Analysis

4Using SPSS;

http://161.142.92.104/spss/

4Using Excel;

http://161.142.92.104/excel/

Page 33: T test and ANOVA

©[email protected] 2012

T Test, ANOVA & Proportionate Test

Assoc. Prof . Dr Azmi Mohd Tamil

Dept of Community Health

Universiti Kebangsaan Malaysia

FF2613

Page 34: T test and ANOVA

© d rtam il@ g m ail.co m 2012

T - Test

Independent T-Test

Student’s T-Test

Paired T-Test

ANOVA

Page 35: T test and ANOVA

©[email protected] 2012

Student’s T-test

William Sealy Gosset @

“Student”, 1908. The Probable

Error of Mean. Biometrika.

Page 36: T test and ANOVA

©[email protected] 2012

Student’s T-Test

4To compare the means of two independent

groups. For example; comparing the mean

Hb between cases and controls. 2 variables

are involved here, one quantitative (i.e. Hb)

and the other a dichotomous qualitative

variable (i.e. case/control).

4 t =

Page 37: T test and ANOVA

©[email protected] 2012

Examples: Student’s t-test

4Comparing the level of blood cholestrol

(mg/dL) between the hypertensive and

normotensive.

4Comparing the HAMD score of two

groups of psychiatric patients treated

with two different types of drugs (i.e.

Fluoxetine & Sertraline

Page 38: T test and ANOVA

©[email protected] 2012

Example

Group Statistics

35 4.2571 3.12808

32 3.8125 4.39529

DRUG

F

S

DHAMAWK6

N Mean Std. Deviation

Independent Samples Test

.48 65 .633 .4446Equal variances

assumed

DHAMAWK6

t df

Sig.

(2-tailed)

Mean

Difference

t-test for Equality of Means

Page 39: T test and ANOVA

©[email protected] 2012

Assumptions of T test

4Observations are normally distributed in

each population. (Explore)

4The population variances are equal.

( L e v e n e ’s T e s t )

The 2 groups are independent of each

other. (Design of study)

Page 40: T test and ANOVA

©[email protected] 2012

Manual Calculation

4 Sample size > 30 4 Small sample size,

equal variance

1 2

2 2

1 2

1 2

X Xt

s s

n n

−=

+

1 2

0

1 2

2 22 1 1 2 20

1 2

1 1

( 1) ( 1)

( 1) ( 1)

X Xt

sn n

n s n ss

n n

−=

+

− + −=

− + −

Page 41: T test and ANOVA

©[email protected] 2012

Example – compare cholesterol level

4Hypertensive :Mean : 214.92s.d. : 39.22n : 64

4Normal :Mean : 182.19s.d. : 37.26n : 36

• Comparing the cholesterol level between

hypertensive and normal patients.

• The difference is (214.92 – 182.19) = 32.73 mg%.

• H0 : There is no difference of cholesterol level

between hypertensive and normal patients.

• n > 30, (64+36=100), therefore use the first formula.

Page 42: T test and ANOVA

©[email protected] 2012

Calculation

4 t = (214.92- 182.19)________ ((39.222/64)+(37.262/36))0.5

4 t = 4.137

4 df = n1+n2-2 = 64+36-2 = 98

4 Refer to t table; with t = 4.137, p < 0.001

1 2

2 2

1 2

1 2

X Xt

s s

n n

−=

+

Page 43: T test and ANOVA

If df>100, can refer Table A1.We don’t have 4.137 so we use 3.99 instead. If t = 3.99, then p=0.00003x2=0.00006Therefore if t=4.137, p<0.00006.

Page 44: T test and ANOVA

Or can refer to Table A3.We don’t have df=98,

so we use df=60 instead. t = 4.137 > 3.46 (p=0.001)

Therefore if t=4.137, p<0.001.

Page 45: T test and ANOVA

©[email protected] 2012

Conclusion

• Therefore p < 0.05, null hypothesis rejected.

• There is a significant difference of

cholesterol level between hypertensive and

normal patients.

• Hypertensive patients have a significantly

higher cholesterol level compared to

normotensive patients.

Page 46: T test and ANOVA

©[email protected] 2012

Exercise (try it)

• Comparing the mini test 1 (2012) results between

UKM and ACMS students.

• The difference is 11.255

• H0 : There is no difference of marks between UKM

and ACMS students.

• n > 30, therefore use the first formula.

Page 47: T test and ANOVA

©[email protected] 2012

Exercise (answer)

4Null hypothesis rejected

4There is a difference of marks between

UKM and ACMS students. UKM marks

higher than AUCMS

Page 48: T test and ANOVA

©[email protected] 2012

T-Test In SPSS

4 For this exercise, we will be using the data from the CD, under Chapter 7, sga-bab7.sav

4 This data came from a case-control study on factors affecting SGA in Kelantan.

4 Open the data & select ->Analyse

>Compare Means>Ind-Samp T

Test…

Page 49: T test and ANOVA

©[email protected] 2012

T-Test in SPSS

4 We want to see whether there is any association between the mothers’ weight and SGA. So select the risk factor (weight2) into ‘Test Variable’ & the outcome (SGA) into ‘Grouping Variable’.

4 Now click on the ‘Define Groups’ button. Enter

• 0 (Control) for Group 1 and

• 1 (Case) for Group 2.

4 Click the ‘Continue’ button & then click the ‘OK’ button.

Page 50: T test and ANOVA

©[email protected] 2012

T-Test Results

4Compare the mean+sd of both groups.

• Normal 58.7+11.2 kg

• SGA 51.0+ 9.4 kg

4Apparently there is a difference of

weight between the two groups.

Group Statistics

108 58.666 11.2302 1.0806

109 51.037 9.3574 .8963

SGA

Normal

SGA

Weight at first ANC

N Mean Std. Deviation

Std. Error

Mean

Page 51: T test and ANOVA

©[email protected] 2012

Results & Homogeneity of Variances

4 Look at the p value of Levene’s Test. If p is not significant then equal variances is assumed (use top row).

4 If it is significant then equal variances is not assumed (use bottom row).

4 So the t value here is 5.439 and p < 0.0005. The difference is significant. Therefore there is an association between the mothers weight and SGA.

Independent Samples Test

1.862 .174 5.439 215 .000 7.629 1.4028 4.8641 10.3940

5.434 207.543 .000 7.629 1.4039 4.8612 10.3969

Equal variances

assumed

Equal variances

not assumed

Weight at first ANC

F Sig.

Levene's Test for

Equality of Variances

t df Sig. (2-tailed)

Mean

Difference

Std. Error

Difference Lower Upper

95% Confidence

Interval of the

Difference

t-test for Equality of Means

Page 52: T test and ANOVA

©[email protected] 2012

How to present the result?

Group N Mean test p

Normal 108 58.7+11.2 kg

T test

t = 5.439<0.0005

SGA 109 51.0+ 9.4

Page 53: T test and ANOVA

©[email protected] 2012

Paired t-test

“Repeated measurement on the

same individual”

Page 54: T test and ANOVA

©[email protected] 2012

Paired T-Test

4“Repeated measurement on the same

individual”

4 t =

Page 55: T test and ANOVA

©[email protected] 2012

Formula

( )22

0

1

1

d

i

d

p

dt

s

n

dd

nsn

df n

−=

−=

= −

∑∑

Page 56: T test and ANOVA

©[email protected] 2012

Examples of paired t-test

4Comparing the HAMD score between

week 0 and week 6 of treatment with

Sertraline for a group of psychiatric

patients.

4Comparing the haemoglobin level

amongst anaemic pregnant women after

6 weeks of treatment with haematinics.

Page 57: T test and ANOVA

©[email protected] 2012

Example

Paired Samples Statistics

13.9688 32 6.48315

3.8125 32 4.39529

DHAMAWK0

DHAMAWK6

Pair

1

Mean N Std. Deviation

Paired Samples Test

10.1563 6.75903 8.500 31 .000DHAMAWK0 -

DHAMAWK6

Pair

1

Mean

Std.

Deviation

Paired Differences

t df

Sig.

(2-tailed)

Page 58: T test and ANOVA

©[email protected] 2012

M a n u a l C a l c u l a t i o n

D The measurement of the systolic and diastolic

blood pressures was done two consecutive

times with an interval of 10 minutes. You want

t o d e t e r m in e w h e t h e r t h e r e w a s a n y

difference between those two measurements.

4H0:There is no difference of the systolic blood

pressure during the first (time 0) and second

measurement (time 10 minutes).

Page 59: T test and ANOVA

©[email protected] 2012

Calculation

4Calculate the difference between first &

second measurement and square it.

Total up the difference and the square.

Page 60: T test and ANOVA

©[email protected] 2012

Calculation

4∑ d = 112 ∑ d2 = 1842 n = 36

4Mean d = 112/36 = 3.11

4sd = ((1842-1122/36)/35)0.5

sd = 6.53

4 t = 3.11/(6.53/6)

t = 2.858

4df = np – 1 = 36 – 1 = 35.

4Refer to t table;

( )22

0

1

1

d

i

d

p

dt

s

n

dd

nsn

df n

−=

−=

= −

∑∑

Page 61: T test and ANOVA

Refer to Table A3.We don’t have df=35,

so we use df=30 instead. t = 2.858, larger than 2.75

(p=0.01) but smaller than 3.03 (p=0.005). 3.03>t>2.75

Therefore if t=2.858, 0.005<p<0.01.

Page 62: T test and ANOVA

©[email protected] 2012

Conclusion

with t = 2.858, 0.005<p<0.01Therefore p < 0.01.Therefore p < 0.05, null hypothesis rejected.Conclusion: There is a significant difference of the systolic blood pressure between the first and second measurement. The mean average of first reading is significantly higher compared to the second reading.

Page 63: T test and ANOVA

©[email protected] 2012

Paired T-Test In SPSS

4 For this exercise, we will be using the data from the CD, under Chapter 7, sgapair.sav

4 This data came from a controlled trial on haematinic effect on Hb.

4 Open the data & select ->Analyse>Compare Means

>Paired-Samples T

Test…

Page 64: T test and ANOVA

©[email protected] 2012

Paired T-Test In SPSS

4 We want to see whether there is any association between the prescription on haematinic to anaemic pregnant mothers and Hb.

4 We are comparing the Hb before & after treatment. So pair the two measurements (Hb2 & Hb3) together.

4 Click the ‘OK’ button.

Page 65: T test and ANOVA

©[email protected] 2012

Paired T-Test Results

4This shows the mean & standard

deviation of the two groups.

Paired Samples Statistics

10.247 70 .3566 .0426

10.594 70 .9706 .1160

HB2

HB3

Pair

1

Mean N Std. Deviation

Std. Error

Mean

Page 66: T test and ANOVA

©[email protected] 2012

Paired T-Test Results

4This shows the mean difference of Hb

before & after treatment is only 0.347

g%.

4Yet the t=3.018 & p=0.004 show the

difference is statistically significant.

Paired Samples Test

-.347 .9623 .1150 -.577 -.118 -3.018 69 .004HB2 - HB3Pair 1

Mean Std. Deviation

Std. Error

Mean Lower Upper

95% Confidence

Interval of the

Difference

Paired Differences

t df Sig. (2-tailed)

Page 67: T test and ANOVA

©[email protected] 2012

How to present the result?

Group NMean D

(Diff.)Test p

Before

treatment

(HB2) vs

After

treatment

(HB3)

70 0.35 + 0.96

Paired T-

test

t = 3.018

0.004

Page 68: T test and ANOVA

©[email protected] 2012

ANOVA

Page 69: T test and ANOVA

©[email protected] 2012

ANOVA –Analysis of Variance

4Extension of independent-samples t test

4Compares the means of groups of

independent observations

• Don’t be fooled by the name. ANOVA does

not compare variances.

4Can compare more than two groups

Page 70: T test and ANOVA

©[email protected] 2012

One-Way ANOVA F-Test

One-Way ANOVA F-Test

4 Tests the equality of 2 or more population means

4 Variables• One nominal scaled independent variable

– 2 or more treatment levels or classifications

(i.e. Race; Malay, Chinese, Indian & Others)

• One interval or ratio scaled dependent variable(i.e. weight, height, age)

4 Used to analyse completely randomized

experimental designs

Page 71: T test and ANOVA

©[email protected] 2012

Examples

4Comparing the blood cholesterol levels

between the bus drivers, bus conductors

and taxi drivers.

4Comparing the mean systolic pressure

between Malays, Chinese, Indian &

Others.

Page 72: T test and ANOVA

©[email protected] 2012

One-Way ANOVA F-Test Assumptions

One-Way ANOVA F-Test Assumptions

4Randomness & independence of errors

• Independent random samples are drawn

4Normality

• Populations are normally distributed

4Homogeneity of variance

• Populations have equal variances

Page 73: T test and ANOVA

©[email protected] 2012

Example

Descriptives

Birth weight

151 2.7801 .52623 1.90 4.72

23 2.7643 .60319 1.60 3.96

44 2.8430 .55001 1.90 3.79

218 2.7911 .53754 1.60 4.72

Housewife

Office work

Field work

Total

N Mean Std. Deviation Minimum Maximum

ANOVA

Birth weight

.153 2 .077 .263 .769

62.550 215 .291

62.703 217

Between Groups

Within Groups

Total

Sum of

Squares df Mean Square F Sig.

Page 74: T test and ANOVA

©[email protected] 2012

Manual Calculation

ANOVA

Page 75: T test and ANOVA

©[email protected] 2012

Manual Calculation

4Not expected to be calculated manually

by medical students.

Page 76: T test and ANOVA

Example: Time To Complete Analysis

45 samples were

analysed using 3 different

blood analyser (Mach1,

Mach2 & Mach3).

15 samples were placed

into each analyser.

Time in seconds was

measured for each

sample analysis.

Page 77: T test and ANOVA

Example: Time To Complete Analysis

The overall mean of the

entire sample was 22.71

seconds.

This is called the “grand”

mean, and is often

denoted by .

If H0 were true then we’d

expect the group means

to be close to the grand

mean.

X

Page 78: T test and ANOVA

Example: Time To Complete Analysis

The ANOVA test is

based on the combined

distances from .

If the combined

distances are large, that

indicates we should

reject H0.

X

Page 79: T test and ANOVA

The Anova Statistic

To combine the differences from the grand mean we

• Square the differences

• Multiply by the numbers of observations in the groups

• Sum over the groups

where the are the group means.

“SSB” = Sum of Squares Between groups

( ) ( ) ( )23

2

2

2

1 151515 SSB XXXXXX MachMachMach −+−+−=

*X

Page 80: T test and ANOVA

The Anova Statistic

To combine the differences from the grand mean we

• Square the differences

• Multiply by the numbers of observations in the groups

• Sum over the groups

where the are the group means.

“SSB” = Sum of Squares Between groups

Note: This looks a bit like a variance.

*X

( ) ( ) ( )23

2

2

2

1 151515 SSB XXXXXX MachMachMach −+−+−=

Page 81: T test and ANOVA

©[email protected] 2012

4Grand Mean = 22.71

4Mean Mach1 = 24.93; (24.93-22.71)2=4.9284

4Mean Mach2 = 22.61; (22.61-22.71)2=0.01

4Mean Mach3 = 20.59; (20.59-22.71)2=4.4944

4SSB = (15*4.9284)+(15*0.01)+(15*4.4944)

4SSB = 141.492

( ) ( ) ( )23

2

2

2

1 151515 SSB XXXXXX MachMachMach −+−+−=

Sum of Squares Between

Page 82: T test and ANOVA

How big is big?

4 For the Time to Complete, SSB = 141.492

4 Is that big enough to reject H0?

4 As with the t test, we compare the statistic to

the variability of the individual observations.

4 In ANOVA the variability is estimated by the

Mean Square Error, or MSE

Page 83: T test and ANOVA

MSEMean Square Error

The Mean Square Error

is a measure of the

variability after the

group effects have

been taken into

account.

where xij is the ith

observation in the jth

group.

( )∑∑ −−

=j i

jij XxKN

MSE21

Page 84: T test and ANOVA

MSEMean Square Error

The Mean Square Error

is a measure of the

variability after the

group effects have

been taken into

account.

where xij is the ith

observation in the jth

group.

( )∑∑ −−

=j i

jij XxKN

MSE21

Page 85: T test and ANOVA

MSEMean Square Error

The Mean Square Error

is a measure of the

variability after the

group effects have

been taken into

account.

( )∑∑ −−

=j i

jij XxKN

MSE21

Page 86: T test and ANOVA

©[email protected] 2012

( )∑∑ −−

=j i

jijXx

KNMSE

21

Mach1 (x-mean)^2 Mach2 (x-mean)^2 Mach3 (x-mean)^2

23.73 1.4400 21.5 1.2321 19.74 0.7225

23.74 1.4161 21.6 1.0201 19.75 0.7056

23.75 1.3924 21.7 0.8281 19.76 0.6889

24.00 0.8649 21.7 0.8281 19.9 0.4761

24.10 0.6889 21.8 0.6561 20 0.3481

24.20 0.5329 21.9 0.5041 20.1 0.2401

25.00 0.0049 22.75 0.0196 20.3 0.0841

25.10 0.0289 22.75 0.0196 20.4 0.0361

25.20 0.0729 22.75 0.0196 20.5 0.0081

25.30 0.1369 23.3 0.4761 20.5 0.0081

25.40 0.2209 23.4 0.6241 20.6 0.0001

25.50 0.3249 23.4 0.6241 20.7 0.0121

26.30 1.8769 23.5 0.7921 22.1 2.2801

26.31 1.9044 23.5 0.7921 22.2 2.5921

26.32 1.9321 23.6 0.9801 22.3 2.9241

SUM 12.8380 9.4160 11.1262

Page 87: T test and ANOVA

©[email protected] 2012

4Note that the variation of the means (141.492) seems quite large (more likely to be significant???) compared to the variance of observations within groups (12.8380+9.4160+11.1262=33.3802).

4MSE = 33.3802/(45-3) = 0.7948

( )∑∑ −−

=j i

jijXx

KNMSE

21

Page 88: T test and ANOVA

Notes on MSE

4 If there are only two groups, the MSE is equal

to the pooled estimate of variance used in the

equal-variance t test.

4 ANOVA assumes that all the group variances

are equal.

4 Other options should be considered if group

variances differ by a factor of 2 or more.

4 (12.8380 ~ 9.4160 ~ 11.1262)

Page 89: T test and ANOVA

ANOVA F Test

4 The ANOVA F test is based on the F statistic

where K is the number of groups.

4 Under H0 the F statistic has an “F” distribution,

with K-1 and N-K degrees of freedom (N is the

total number of observations)

MSE

KSSBF

)1( −=

Page 90: T test and ANOVA

Time to Analyse:F test p-value

To get a p-value we

compare our F statistic

to an F(2, 42)

distribution.

Page 91: T test and ANOVA

Time to Analyse:F test p-value

To get a p-value we

compare our F statistic

to an F(2, 42)

distribution.

In our example

We cannot draw the line

since the F value is so

large, therefore the p

value is so small!!!!!!

015.89423802.33

2492.141==F

Page 92: T test and ANOVA

©[email protected] 2012

Refer to F Dist. Table (α=0.01).We don’t have df=2;42,

so we use df=2;40 instead. F = 89.015, larger than 5.18

(p=0.01) Therefore if F=89.015, p<0.01.

Why use df=2;42?We have 3 groups so K-1 = 2We have 45 samples therefore N-K = 42.

Page 93: T test and ANOVA

Time to Analyse:F test p-value

To get a p-value we

compare our F statistic

to an F(2, 42)

distribution.

In our example

The p-value is really

( ) 00080000000000.0015.89(2,42) => FP

015.89423802.33

2492.141==F

Page 94: T test and ANOVA

ANOVA Table

Sum of

Squares df

Mean

Square F Sig.

Between

Groups141.492 2 40.746 89.015 .0000000

Within Groups 33.380 42 .795

Total 174.872 44

Results are often displayed using an ANOVA Table

Page 95: T test and ANOVA

ANOVA Table

Sum of

Squares df

Mean

Square F Sig.

Between

Groups141.492 2 40.746 89.015 .0000000

Within Groups 33.380 42 .795

Total 174.872 44

Results are often displayed using an ANOVA Table

Sum of Squares

Between (SSB)

Mean Square

Error (MSE)F Statistic p value

Pop Quiz!: Where are the following quantities presented in this table?

Page 96: T test and ANOVA

ANOVA Table

Sum of

Squares df

Mean

Square F Sig.

Between

Groups141.492 2 40.746 89.015 .0000000

Within Groups 33.380 42 .795

Total 174.872 44

Results are often displayed using an ANOVA Table

Sum of Squares

Between (SSB)

Mean Square

Error (MSE)F Statistic p value

Page 97: T test and ANOVA

ANOVA Table

Sum of

Squares df

Mean

Square F Sig.

Between

Groups141.492 2 40.746 89.015 .0000000

Within Groups 33.380 42 .795

Total 174.872 44

Results are often displayed using an ANOVA Table

Sum of Squares

Between (SSB)

Mean Square

Error (MSE)F Statistic p value

Page 98: T test and ANOVA

ANOVA Table

Sum of

Squares df

Mean

Square F Sig.

Between

Groups141.492 2 40.746 89.015 .0000000

Within Groups 33.380 42 .795

Total 174.872 44

Results are often displayed using an ANOVA Table

Sum of Squares

Between (SSB)

Mean Square

Error (MSE)F Statistic p value

Page 99: T test and ANOVA

ANOVA Table

Sum of

Squares df

Mean

Square F Sig.

Between

Groups141.492 2 40.746 89.015 .0000000

Within Groups 33.380 42 .795

Total 174.872 44

Results are often displayed using an ANOVA Table

Sum of Squares

Between (SSB)

Mean Square

Error (MSE)F Statistic p value

Page 100: T test and ANOVA

©[email protected] 2012

ANOVA In SPSS

4 For this exercise, we will be using the data from the CD, under Chapter 7, sga-bab7.sav

4 This data came from a case-control study on factors affecting SGA in Kelantan.

4 Open the data & select ->Analyse

>Compare Means>One-Way

ANOVA…

Page 101: T test and ANOVA

©[email protected] 2012

ANOVA in SPSS

4 We want to see whether there is any association between the babies’ weight and mothers’ type of work. So select the risk factor (typework) into ‘Factor’ & the outcome (birthwgt) into ‘Dependent’.

4 Now click on the ‘Post Hoc’button. Select Bonferonni.

4 Click the ‘Continue’ button & then click the ‘OK’ button.

4 Then click on the ‘Options’button.

Page 102: T test and ANOVA

©[email protected] 2012

ANOVA in SPSS

4 Select ‘Descriptive’,

‘Homegeneity of

variance test’ and

‘Means plot’.

4 Click ‘Continue’ and

then ‘OK’.

Page 103: T test and ANOVA

©[email protected] 2012

ANOVA Results

4Compare the mean+sd of all groups.

4Apparently there are not much

difference of babies’ weight between the

groups.

Descriptives

Birth weight

151 2.7801 .52623 .04282 2.6955 2.8647 1.90 4.72

23 2.7643 .60319 .12577 2.5035 3.0252 1.60 3.96

44 2.8430 .55001 .08292 2.6757 3.0102 1.90 3.79

218 2.7911 .53754 .03641 2.7193 2.8629 1.60 4.72

Housewife

Office work

Field work

Total

N Mean Std. Deviation Std. Error Lower Bound Upper Bound

95% Confidence Interval for

Mean

Minimum Maximum

Page 104: T test and ANOVA

©[email protected] 2012

Results & Homogeneity of Variances

4Look at the p value of Levene’s Test. If p

is not significant then equal variances is

assumed.

Test of Homogeneity of Variances

Birth weight

.757 2 215 .470

Levene

Statistic df1 df2 Sig.

Page 105: T test and ANOVA

©[email protected] 2012

ANOVA Results

4So the F value here is 0.263 and p =0.769.

The difference is not significant. Therefore

there is no association between the

babies’ weight and mothers’ type of work.

ANOVA

Birth weight

.153 2 .077 .263 .769

62.550 215 .291

62.703 217

Between Groups

Within Groups

Total

Sum of

Squares df Mean Square F Sig.

Page 106: T test and ANOVA

©[email protected] 2012

How to present the result?

Type of Work Mean+sd Test p

Office 2.76 + 0.60

ANOVA

F = 0.2630.769Housewife 2.78 + 0.53

Farmer 2.84 + 0.55

Page 107: T test and ANOVA

©[email protected] 2012

Proportionate Test

Page 108: T test and ANOVA

©[email protected] 2012

Proportionate Test

4Qualitative data utilises rates, i.e. rate of

anaemia among males & females

4To compare such rates, statistical tests

such as Z-Test and Chi-square can be

used.

Page 109: T test and ANOVA

©[email protected] 2012

Formula

• where p1 is the rate for

event 1 = a1/n1

• p2 is the rate for event 2

= a2/n2

• a1 and a2 are frequencies

of event 1 and 2

4 We refer to the normal

distribution table to

decide whether to reject

or not the null

hypothesis.

1 2

0 0

1 2

1 1 2 20

1 2

0 0

1 1

1

p pz

p qn n

p n p np

n n

q p

−=

+

+=

+

= −

Page 110: T test and ANOVA

©[email protected] 2012

http://stattrek.com/hypothesis-test/proportion.aspx

4 ■The sampling method is simple random

sampling.

4 ■Each sample point can result in just two

possible outcomes. We call one of these

outcomes a success and the other, a failure.

4 ■The sample includes at least 10 successes

and 10 failures.

4 ■The population size is at least 10 times as

big as the sample size.

Page 111: T test and ANOVA

©[email protected] 2012

Example

4Comparison of worm infestation rate between male and female medical students in Year 2.

4Rate for males ; p1= 29/96 = 0.302

4Rate for females;p2 =24/104 = 0.231

4H0: There is no difference of worm infestation rate between male and female medical students in Year 2

Page 112: T test and ANOVA

©[email protected] 2012

Cont.

p1 p2

p0 q0

Page 113: T test and ANOVA

©[email protected] 2012

Cont.

4p0 = (29/96*96)+(24/104*104) = 0.265

96+104

4q0 = 1 – 0.265 = 0.735

Page 114: T test and ANOVA

©[email protected] 2012

Cont.

4 z = 0.302 - 0.231 = 1.1367

((0.735*0.265) (1/96 + 1/104))0.5

4 From the normal distribution table (A1), z value

is significant at p=0.05 if it is above 1.96. Since

the value is less than 1.96, then there is no

difference of rate for worm infestatation

between the male and female students.

Page 115: T test and ANOVA

Refer to Table A1.We don’t have 1.1367 so we use 1.14 instead. If z = 1.14, then p=0.1271x2=0.2542Therefore if z=1.14, p=0.2542. H0 not rejected

Page 116: T test and ANOVA

©[email protected] 2012

Exercise (try it)

4Comparison of failure rate between

ACMS and UKM medical students in

Year 2 for minitest 1 (MS2 2012).

4Rate for UKM ; p1= 42/196 = 0.214

4Rate for ACMS;p2 = 35/70 = 0.5

Page 117: T test and ANOVA

©[email protected] 2012

Answer

4P1 = 0.214, p2 = 0.5, p0 = 0.289, q0 = 0.711

4N1 = 196, n2 = 70, Z = 20.470.5 = 4.52

4p < 0.00006