Assessing the Evidence: Statistical Inference Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Assessing the Evidence: Statistical Inference Peter T. Donnan

Professor of Epidemiology and Biostatistics

Statistics for Health Statistics for Health ResearchResearch

Objectives of SessionObjectives of Session

•Understand idea of inferenceUnderstand idea of inference

•Confidence interval Confidence interval approachapproach

•Significance testingSignificance testing

•Briefly - Some simple testsBriefly - Some simple tests

•Understand idea of inferenceUnderstand idea of inference

•Confidence interval Confidence interval approachapproach

•Significance testingSignificance testing

•Briefly - Some simple testsBriefly - Some simple tests

Statistical Statistical InferenceInference

The aim is to draw conclusions The aim is to draw conclusions (INFER) from the specific (sample) (INFER) from the specific (sample) to the more general (population).to the more general (population).

Are differences between groups Are differences between groups chance occurrences or do they chance occurrences or do they represent statistically significant represent statistically significant results (I.e. real differences)?results (I.e. real differences)?

The aim is to draw conclusions The aim is to draw conclusions (INFER) from the specific (sample) (INFER) from the specific (sample) to the more general (population).to the more general (population).


Extrapolating from the Extrapolating from the sample to populationsample to population

Illustrations Ian Christie, Orthopaedic & Trauma Surgery, Copyright

2002 University of Dundee

Two approaches: confidence Two approaches: confidence intervals and hypothesis intervals and hypothesis

testingtesting

Confidence IntervalsConfidence Intervals

Random variability means that Random variability means that there is statistical (random) there is statistical (random) variation around any summary variation around any summary statistic:statistic:Mean, proportion, difference Mean, proportion, difference between means, etc between means, etc

Confidence IntervalsConfidence Intervals

Random variability means that Random variability means that there is statistical (random) there is statistical (random) variation around any summary variation around any summary statistic:statistic:Mean, proportion, difference Mean, proportion, difference between means, etc between means, etc

Confidence intervalsConfidence intervals

Uncertainty expressed as a Confidence Uncertainty expressed as a Confidence Interval defined by an upper and a Interval defined by an upper and a

lower value: lower value:

Summary statistic Summary statistic constant x standard constant x standard errorerror

Uncertainty expressed as a Confidence Uncertainty expressed as a Confidence Interval defined by an upper and a Interval defined by an upper and a

lower value: lower value:

Summary statistic Summary statistic constant x standard constant x standard errorerror

e.g. For 95% CI constant = 1.96 e.g. For 95% CI constant = 1.96 from Normal distributionfrom Normal distribution


For a percentage the standard For a percentage the standard error is given by: error is given by:

For a percentage the standard For a percentage the standard error is given by: error is given by:

n

ppse

)100(

So for p = 35%, se = 4.8%, So for p = 35%, se = 4.8%, where n = 100where n = 100


Consider a prevalence of 35% for the Consider a prevalence of 35% for the uptake of statins for secondary uptake of statins for secondary prevention of MI from one practiceprevention of MI from one practice

Consider a prevalence of 35% for the Consider a prevalence of 35% for the uptake of statins for secondary uptake of statins for secondary prevention of MI from one practiceprevention of MI from one practice

prevalence = 35%, prevalence = 35%,

95% CI = 35% ± 1.96x 4.8%95% CI = 35% ± 1.96x 4.8%

= 25.6% to 44.4%= 25.6% to 44.4%


For a mean the standard error For a mean the standard error is given by: is given by:

For a mean the standard error For a mean the standard error is given by: is given by:

ns

=se

where s is the standard where s is the standard deviation of the distributiondeviation of the distribution


Consider a mean cholesterol Consider a mean cholesterol measurement of 5.4 mmol/l for a group measurement of 5.4 mmol/l for a group of 100 patients with type 2 diabetes of 100 patients with type 2 diabetes and standard deviation s = 1.1 mmol/l and standard deviation s = 1.1 mmol/l

Consider a mean cholesterol Consider a mean cholesterol measurement of 5.4 mmol/l for a group measurement of 5.4 mmol/l for a group of 100 patients with type 2 diabetes of 100 patients with type 2 diabetes and standard deviation s = 1.1 mmol/l and standard deviation s = 1.1 mmol/l

= 5.4, 95% CI = 5.4 ± 1.96x 1.1/√100= 5.4, 95% CI = 5.4 ± 1.96x 1.1/√100

= 5.2 to 5.6 mmol/l= 5.2 to 5.6 mmol/l


Confidence intervals give estimation of Confidence intervals give estimation of precision of summary statisticprecision of summary statistic

Confidence intervals give estimation of Confidence intervals give estimation of precision of summary statisticprecision of summary statistic

PrecisePrecise

ImprecisImprecisee

Major determinant of precision is sample Major determinant of precision is sample sizesize

Illustrations Ian Christie, Orthopaedic & Trauma Surgery, Copyright

2002 University of Dundee


Warning!Warning!Confidence intervals are usually Confidence intervals are usually

interpreted in a Bayesian way interpreted in a Bayesian way even though using a frequentist even though using a frequentist

method to estimatemethod to estimate

Warning!Warning!Confidence intervals are usually Confidence intervals are usually

interpreted in a Bayesian way interpreted in a Bayesian way even though using a frequentist even though using a frequentist

method to estimatemethod to estimate

The probability of the true value The probability of the true value

lying within the confidence interval lying within the confidence interval is is NOTNOT, repeat , repeat NOTNOT 95% 95%

Bayesian confidence intervals are called Bayesian confidence intervals are called CREDIBLE INTERVALS and probability of true CREDIBLE INTERVALS and probability of true value lying in credible interval IS 95%value lying in credible interval IS 95%

Frequentist Frequentist Confidence Confidence Interval Interval means that with means that with

repeat samples… repeat samples…

95% Confidence interval means that 95% of proportions from repeat studies would be within the confidence interval

Study Sample & 95% CI

RepeatRepeatSamplesSamples…………..ad ..ad infinituinfinitumm

To put it another way the To put it another way the 95% Confidence Interval 95% Confidence Interval

is… is…

……one of many that could be one of many that could be constructed with the assurance constructed with the assurance that 95% of the time the true that 95% of the time the true value of the parameter would be value of the parameter would be includedincluded

Sample & 95% CI

RepeatRepeatSamplesSamples…………..ad ..ad infinituinfinitumm

Statistical Inference: Statistical Inference: Hypothesis testingHypothesis testing


The process of inference starts from The process of inference starts from a neutral position – Null Hypothesisa neutral position – Null Hypothesis


The process of inference starts from The process of inference starts from a neutral position – Null Hypothesisa neutral position – Null Hypothesis


The null hypothesis (HThe null hypothesis (H00) is usually ) is usually set to ‘there is no difference’set to ‘there is no difference’

Collect data and carry out Collect data and carry out

hypothesis testshypothesis tests

Accept or reject the null Accept or reject the null hypothesishypothesis

The null hypothesis (HThe null hypothesis (H00) is usually ) is usually set to ‘there is no difference’set to ‘there is no difference’

Collect data and carry out Collect data and carry out

hypothesis testshypothesis tests

Accept or reject the null Accept or reject the null hypothesishypothesis

Legal analogyLegal analogy Hypothesis testing Hypothesis testing

Legal trialLegal trial Hypothesis Hypothesis testtest

Defendant Defendant assumed assumed innocent until innocent until proved guiltyproved guilty

Null Null hypothesis hypothesis assumes no assumes no difference difference between between groupsgroups



Examine Examine evidenceevidence

Calculate test Calculate test statistic based statistic based on evidence on evidence from sample from sample datadata



1.Accept evidence 1.Accept evidence proves guiltproves guilt

2.Evidence does 2.Evidence does not prove guilt ‘not not prove guilt ‘not proven’proven’

1.1. Accept significant Accept significant difference difference between groupsbetween groups

2.2. Insufficient Insufficient evidence to reject evidence to reject HH00

Legal Analogy Legal Analogy Hypothesis Hypothesis

TestingTesting

No statistical No statistical significance significance notnot same as same as

No differenceNo difference

Illustrations Ian Christie, Orthopaedic & Trauma Surgery, Copyright 2002 University of Dundee


The test statistic generally consists The test statistic generally consists of: of:

Summary statistic – HSummary statistic – H00 value valueStandard error of summaryStandard error of summary

e.g.Test that the mean is different to e.g.Test that the mean is different to zero: zero:

t = t = Mean – 0Mean – 0 Se(Mean)Se(Mean)

The test statistic generally consists The test statistic generally consists of: of:

Summary statistic – HSummary statistic – H00 value valueStandard error of summaryStandard error of summary

e.g.Test that the mean is different to e.g.Test that the mean is different to zero: zero:

t = t = Mean – 0Mean – 0 Se(Mean)Se(Mean)


The test statistic is then compared with The test statistic is then compared with tabulated values of a distribution (e.g tabulated values of a distribution (e.g Normal distribution, t-distribution) Normal distribution, t-distribution)

Assuming the null hypothesis is true, what Assuming the null hypothesis is true, what is the probability of obtaining the actual is the probability of obtaining the actual observed value of the test statistic, t?observed value of the test statistic, t?

How likely is the value of t, to have How likely is the value of t, to have occurred by chance alone?occurred by chance alone?

The test statistic is then compared with The test statistic is then compared with tabulated values of a distribution (e.g tabulated values of a distribution (e.g Normal distribution, t-distribution) Normal distribution, t-distribution)

Assuming the null hypothesis is true, what Assuming the null hypothesis is true, what is the probability of obtaining the actual is the probability of obtaining the actual observed value of the test statistic, t?observed value of the test statistic, t?

How likely is the value of t, to have How likely is the value of t, to have occurred by chance alone?occurred by chance alone?


Assuming the null hypothesis is true, Assuming the null hypothesis is true, what is the probability of obtaining what is the probability of obtaining the actual observed or greater value the actual observed or greater value of the test statistic, t?of the test statistic, t?

Using distribution Using distribution Of t which is similar to a Of t which is similar to a Normal distributionNormal distributionthis probability can be this probability can be Obtained in figure as Obtained in figure as p = 0.042p = 0.042

Assuming the null hypothesis is true, Assuming the null hypothesis is true, what is the probability of obtaining what is the probability of obtaining the actual observed or greater value the actual observed or greater value of the test statistic, t?of the test statistic, t?

Using distribution Using distribution Of t which is similar to a Of t which is similar to a Normal distributionNormal distributionthis probability can be this probability can be Obtained in figure as Obtained in figure as p = 0.042p = 0.042

2.1% 2.1%

http://upload.wikimedia.org/wikipedia/commons/8/8c/Standard_deviation_diagram.svg


If probability of the occurrence of the observed value < 5% or If probability of the occurrence of the observed value < 5% or p < 0.05 then this is unlikely to be a chance findingp < 0.05 then this is unlikely to be a chance finding

Result is declared Result is declared statistically significantstatistically significant

Fortunately most statistical software (e.g. SPSS) will carry out Fortunately most statistical software (e.g. SPSS) will carry out the test you request and give p-values (SPSS labels as ‘Sig’)the test you request and give p-values (SPSS labels as ‘Sig’)

If probability of the occurrence of the observed value < 5% or If probability of the occurrence of the observed value < 5% or p < 0.05 then this is unlikely to be a chance findingp < 0.05 then this is unlikely to be a chance finding

Result is declared Result is declared statistically significantstatistically significant

Fortunately most statistical software (e.g. SPSS) will carry out Fortunately most statistical software (e.g. SPSS) will carry out the test you request and give p-values (SPSS labels as ‘Sig’)the test you request and give p-values (SPSS labels as ‘Sig’)

Two group hypothesis Two group hypothesis testingtesting

We will consider three common tests:We will consider three common tests:1.1.t-test for difference between two meanst-test for difference between two means

2.2.Chi-squared test (Chi-squared test (2) for difference between two ) for difference between two proportionsproportions

3.3. Logrank test for difference between two groups median Logrank test for difference between two groups median survivalsurvival

All are easily carried out in SPSSAll are easily carried out in SPSS

We will consider three common tests:We will consider three common tests:1.1.t-test for difference between two meanst-test for difference between two means

2.2.Chi-squared test (Chi-squared test (2) for difference between two ) for difference between two proportionsproportions

3.3. Logrank test for difference between two groups median Logrank test for difference between two groups median survivalsurvival

All are easily carried out in SPSSAll are easily carried out in SPSS

Are practices with access to Are practices with access to community hospitals further community hospitals further

away on average from general away on average from general hospitals?hospitals?

No accessNo access Access to CHAccess to CH

n=17n=17

Mean = 8.68 kmMean = 8.68 km

SD = 11.90 kmSD = 11.90 km

Se (mean) = 2.89Se (mean) = 2.89

n=10n=10

Mean = 21.30 kmMean = 21.30 km

SD = 5.68 kmSD = 5.68 km

Se (mean) = 1.79Se (mean) = 1.79

Example t-testExample t-test

1 and 2 refer to the two groups1 and 2 refer to the two groups

N is the number in each groupN is the number in each groupX bar refers to the mean and bar refers to the mean and

ssp p is the pooled standard deviationis the pooled standard deviation

1 and 2 refer to the two groups1 and 2 refer to the two groups

N is the number in each groupN is the number in each groupX bar refers to the mean and bar refers to the mean and

ssp p is the pooled standard deviationis the pooled standard deviation

21p

21

n/1+n/1s

0)xx(=t

--

Example t-testExample t-test

tt = -12.62/ 4.024= -12.62/ 4.024tt = -3.13= -3.13

With 25 degrees of freedom from t-tables With 25 degrees of freedom from t-tables p = 0.004 and so the difference of p = 0.004 and so the difference of 12.62 is highly statistically significant12.62 is highly statistically significant

tt = -12.62/ 4.024= -12.62/ 4.024tt = -3.13= -3.13

With 25 degrees of freedom from t-tables With 25 degrees of freedom from t-tables p = 0.004 and so the difference of p = 0.004 and so the difference of 12.62 is highly statistically significant12.62 is highly statistically significant

0.398*10.1120-21.30)-(8.68

=t

Consider a recent RCTConsider a recent RCT

• Rimonabant vs. placebo to reduce body weight in obese people Rimonabant vs. placebo to reduce body weight in obese people (BMI > 30kg/m(BMI > 30kg/m22))

• Rimonabant (20 mg daily) inhibits affects of cannabinoid Rimonabant (20 mg daily) inhibits affects of cannabinoid agonists which in turn affects energy balanceagonists which in turn affects energy balance

• Mean reduction in body weight at one year was 6.6kg vs. 1.8 kg Mean reduction in body weight at one year was 6.6kg vs. 1.8 kg (rimonab vs. plac)(rimonab vs. plac)

• Difference was 4.7 kg (95% CI 4.1, 5.4)Difference was 4.7 kg (95% CI 4.1, 5.4)• By end of year 2 mean weight was back to start!By end of year 2 mean weight was back to start!

• Rimonabant vs. placebo to reduce body weight in obese people Rimonabant vs. placebo to reduce body weight in obese people (BMI > 30kg/m(BMI > 30kg/m22))

• Rimonabant (20 mg daily) inhibits affects of cannabinoid Rimonabant (20 mg daily) inhibits affects of cannabinoid agonists which in turn affects energy balanceagonists which in turn affects energy balance

• Mean reduction in body weight at one year was 6.6kg vs. 1.8 kg Mean reduction in body weight at one year was 6.6kg vs. 1.8 kg (rimonab vs. plac)(rimonab vs. plac)

• Difference was 4.7 kg (95% CI 4.1, 5.4)Difference was 4.7 kg (95% CI 4.1, 5.4)• By end of year 2 mean weight was back to start!By end of year 2 mean weight was back to start!

Are practices with access to Are practices with access to community hospitals more likely community hospitals more likely

to have training status?to have training status?

12 (71%)12 (71%) 4 (40%)4 (40%)

5 (29%)5 (29%) 6 (60%)6 (60%)

No Yes

No Training Status

Training status

Community Hospital

Are practices with access to Are practices with access to community hospitals more likely community hospitals more likely

to have training status?to have training status?

•Is the difference in proportions Is the difference in proportions

60% - 29% = 31% well within the 60% - 29% = 31% well within the realms of chance or a statistically realms of chance or a statistically significant finding?significant finding?

•Null hypothesis Difference = 0Null hypothesis Difference = 0

• Use chi-squared (Use chi-squared (22) test for ) test for significance of differencesignificance of difference

Pearson Chi-Squared Test

No Yes

No training status

Training status

a b

c d

Comm. Hosp.

a+c b+d

a+b

c+dN


where N = a+b+c+d and |ad – bc| means take the positive value of the calculation

dbcadcba

bcadN2

2


= 2.44 with 1 degree of freedom

df = (no. rows – 1) x (no. columns – 1)

P = 0.118 which is not statistically significant

10171116

2072272

2

More complicated More complicated analysesanalyses

• Introduced simple two-group testsIntroduced simple two-group tests• Results of more complicated analyses are expressed in the same wayResults of more complicated analyses are expressed in the same way• Summary statistic and 95% confidence intervalSummary statistic and 95% confidence interval• Usually p-value is also stated but often implicit from the confidence intervalUsually p-value is also stated but often implicit from the confidence interval• Beware spurious significance e.g. Beware spurious significance e.g. • p = 0.034729 (3 d.p. are enough)p = 0.034729 (3 d.p. are enough)• ‘‘Importance’ refers to size of differenceImportance’ refers to size of difference• An ‘important’ result can be statistically non-significantAn ‘important’ result can be statistically non-significant

• Introduced simple two-group testsIntroduced simple two-group tests• Results of more complicated analyses are expressed in the same wayResults of more complicated analyses are expressed in the same way• Summary statistic and 95% confidence intervalSummary statistic and 95% confidence interval• Usually p-value is also stated but often implicit from the confidence intervalUsually p-value is also stated but often implicit from the confidence interval• Beware spurious significance e.g. Beware spurious significance e.g. • p = 0.034729 (3 d.p. are enough)p = 0.034729 (3 d.p. are enough)• ‘‘Importance’ refers to size of differenceImportance’ refers to size of difference• An ‘important’ result can be statistically non-significantAn ‘important’ result can be statistically non-significant

Sacred 5% levelSacred 5% level

• 5% level is arbitrary 5% level is arbitrary • Practical choice before computer era to Practical choice before computer era to

make tables easier to constructmake tables easier to construct• Are p = 0.046 and p = 0.051 different?Are p = 0.046 and p = 0.051 different?• In past researchers tended to only In past researchers tended to only

present p-valuespresent p-values• Now emphasis is on size of effect and Now emphasis is on size of effect and

95% CI95% CI• Unfortunately, Editors still influenced by Unfortunately, Editors still influenced by

p-values leading to publication biasp-values leading to publication bias

• 5% level is arbitrary 5% level is arbitrary • Practical choice before computer era to Practical choice before computer era to

make tables easier to constructmake tables easier to construct• Are p = 0.046 and p = 0.051 different?Are p = 0.046 and p = 0.051 different?• In past researchers tended to only In past researchers tended to only

present p-valuespresent p-values• Now emphasis is on size of effect and Now emphasis is on size of effect and

95% CI95% CI• Unfortunately, Editors still influenced by Unfortunately, Editors still influenced by

p-values leading to publication biasp-values leading to publication bias

SummarySummary

• Do not get carried away by p-Do not get carried away by p-valuesvalues

• Interpretation requires knowledge Interpretation requires knowledge of area to put into context, but also of area to put into context, but also understanding of what the tests do understanding of what the tests do

• A p-value close to 5% is A p-value close to 5% is approaching significance and may approaching significance and may suggest it is worth investigatingsuggest it is worth investigating

• The size of the effect is more The size of the effect is more clinically or scientifically important clinically or scientifically important

• Do not get carried away by p-Do not get carried away by p-valuesvalues

• Interpretation requires knowledge Interpretation requires knowledge of area to put into context, but also of area to put into context, but also understanding of what the tests do understanding of what the tests do

• A p-value close to 5% is A p-value close to 5% is approaching significance and may approaching significance and may suggest it is worth investigatingsuggest it is worth investigating

• The size of the effect is more The size of the effect is more clinically or scientifically important clinically or scientifically important

Assessing the Evidence: Statistical Inference Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Documents