Assessing the Evidence: Statistical Inference Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Statistics for Health Research Research
Assessing the Evidence: Statistical Inference Peter T. Donnan
Professor of Epidemiology and Biostatistics
Statistics for Health Statistics for Health ResearchResearch
Objectives of SessionObjectives of Session
•Understand idea of inferenceUnderstand idea of inference
•Confidence interval Confidence interval approachapproach
•Significance testingSignificance testing
•Briefly - Some simple testsBriefly - Some simple tests
•Understand idea of inferenceUnderstand idea of inference
•Confidence interval Confidence interval approachapproach
•Significance testingSignificance testing
•Briefly - Some simple testsBriefly - Some simple tests
Statistical Statistical InferenceInference
The aim is to draw conclusions The aim is to draw conclusions (INFER) from the specific (sample) (INFER) from the specific (sample) to the more general (population).to the more general (population).
Are differences between groups Are differences between groups chance occurrences or do they chance occurrences or do they represent statistically significant represent statistically significant results (I.e. real differences)?results (I.e. real differences)?
The aim is to draw conclusions The aim is to draw conclusions (INFER) from the specific (sample) (INFER) from the specific (sample) to the more general (population).to the more general (population).
Are differences between groups Are differences between groups chance occurrences or do they chance occurrences or do they represent statistically significant represent statistically significant results (I.e. real differences)?results (I.e. real differences)?
Extrapolating from the Extrapolating from the sample to populationsample to population
Illustrations Ian Christie, Orthopaedic & Trauma Surgery, Copyright
2002 University of Dundee
Two approaches: confidence Two approaches: confidence intervals and hypothesis intervals and hypothesis
testingtesting
Confidence IntervalsConfidence Intervals
Random variability means that Random variability means that there is statistical (random) there is statistical (random) variation around any summary variation around any summary statistic:statistic:Mean, proportion, difference Mean, proportion, difference between means, etc between means, etc
Confidence IntervalsConfidence Intervals
Random variability means that Random variability means that there is statistical (random) there is statistical (random) variation around any summary variation around any summary statistic:statistic:Mean, proportion, difference Mean, proportion, difference between means, etc between means, etc
Confidence intervalsConfidence intervals
Uncertainty expressed as a Confidence Uncertainty expressed as a Confidence Interval defined by an upper and a Interval defined by an upper and a
lower value: lower value:
Summary statistic Summary statistic constant x standard constant x standard errorerror
Uncertainty expressed as a Confidence Uncertainty expressed as a Confidence Interval defined by an upper and a Interval defined by an upper and a
lower value: lower value:
Summary statistic Summary statistic constant x standard constant x standard errorerror
e.g. For 95% CI constant = 1.96 e.g. For 95% CI constant = 1.96 from Normal distributionfrom Normal distribution
Confidence intervalsConfidence intervals
For a percentage the standard For a percentage the standard error is given by: error is given by:
For a percentage the standard For a percentage the standard error is given by: error is given by:
n
ppse
)100(
So for p = 35%, se = 4.8%, So for p = 35%, se = 4.8%, where n = 100where n = 100
Confidence intervalsConfidence intervals
Consider a prevalence of 35% for the Consider a prevalence of 35% for the uptake of statins for secondary uptake of statins for secondary prevention of MI from one practiceprevention of MI from one practice
Consider a prevalence of 35% for the Consider a prevalence of 35% for the uptake of statins for secondary uptake of statins for secondary prevention of MI from one practiceprevention of MI from one practice
prevalence = 35%, prevalence = 35%,
95% CI = 35% ± 1.96x 4.8%95% CI = 35% ± 1.96x 4.8%
= 25.6% to 44.4%= 25.6% to 44.4%
Confidence intervalsConfidence intervals
For a mean the standard error For a mean the standard error is given by: is given by:
For a mean the standard error For a mean the standard error is given by: is given by:
ns
=se
where s is the standard where s is the standard deviation of the distributiondeviation of the distribution
Confidence intervalsConfidence intervals
Consider a mean cholesterol Consider a mean cholesterol measurement of 5.4 mmol/l for a group measurement of 5.4 mmol/l for a group of 100 patients with type 2 diabetes of 100 patients with type 2 diabetes and standard deviation s = 1.1 mmol/l and standard deviation s = 1.1 mmol/l
Consider a mean cholesterol Consider a mean cholesterol measurement of 5.4 mmol/l for a group measurement of 5.4 mmol/l for a group of 100 patients with type 2 diabetes of 100 patients with type 2 diabetes and standard deviation s = 1.1 mmol/l and standard deviation s = 1.1 mmol/l
= 5.4, 95% CI = 5.4 ± 1.96x 1.1/√100= 5.4, 95% CI = 5.4 ± 1.96x 1.1/√100
= 5.2 to 5.6 mmol/l= 5.2 to 5.6 mmol/l
Confidence intervalsConfidence intervals
Confidence intervals give estimation of Confidence intervals give estimation of precision of summary statisticprecision of summary statistic
Confidence intervals give estimation of Confidence intervals give estimation of precision of summary statisticprecision of summary statistic
PrecisePrecise
ImprecisImprecisee
Major determinant of precision is sample Major determinant of precision is sample sizesize
Illustrations Ian Christie, Orthopaedic & Trauma Surgery, Copyright
2002 University of Dundee
Confidence intervalsConfidence intervals
Warning!Warning!Confidence intervals are usually Confidence intervals are usually
interpreted in a Bayesian way interpreted in a Bayesian way even though using a frequentist even though using a frequentist
method to estimatemethod to estimate
Warning!Warning!Confidence intervals are usually Confidence intervals are usually
interpreted in a Bayesian way interpreted in a Bayesian way even though using a frequentist even though using a frequentist
method to estimatemethod to estimate
The probability of the true value The probability of the true value
lying within the confidence interval lying within the confidence interval is is NOTNOT, repeat , repeat NOTNOT 95% 95%
Bayesian confidence intervals are called Bayesian confidence intervals are called CREDIBLE INTERVALS and probability of true CREDIBLE INTERVALS and probability of true value lying in credible interval IS 95%value lying in credible interval IS 95%
Frequentist Frequentist Confidence Confidence Interval Interval means that with means that with
repeat samples… repeat samples…
95% Confidence interval means that 95% of proportions from repeat studies would be within the confidence interval
Study Sample & 95% CI
RepeatRepeatSamplesSamples…………..ad ..ad infinituinfinitumm
To put it another way the To put it another way the 95% Confidence Interval 95% Confidence Interval
is… is…
……one of many that could be one of many that could be constructed with the assurance constructed with the assurance that 95% of the time the true that 95% of the time the true value of the parameter would be value of the parameter would be includedincluded
Sample & 95% CI
RepeatRepeatSamplesSamples…………..ad ..ad infinituinfinitumm
Statistical Inference: Statistical Inference: Hypothesis testingHypothesis testing
Are differences between groups Are differences between groups chance occurrences or do they chance occurrences or do they represent statistically significant represent statistically significant results (I.e. real differences)?results (I.e. real differences)?
The process of inference starts from The process of inference starts from a neutral position – Null Hypothesisa neutral position – Null Hypothesis
Are differences between groups Are differences between groups chance occurrences or do they chance occurrences or do they represent statistically significant represent statistically significant results (I.e. real differences)?results (I.e. real differences)?
The process of inference starts from The process of inference starts from a neutral position – Null Hypothesisa neutral position – Null Hypothesis
Statistical Inference: Statistical Inference: Hypothesis testingHypothesis testing
The null hypothesis (HThe null hypothesis (H00) is usually ) is usually set to ‘there is no difference’set to ‘there is no difference’
Collect data and carry out Collect data and carry out
hypothesis testshypothesis tests
Accept or reject the null Accept or reject the null hypothesishypothesis
The null hypothesis (HThe null hypothesis (H00) is usually ) is usually set to ‘there is no difference’set to ‘there is no difference’
Collect data and carry out Collect data and carry out
hypothesis testshypothesis tests
Accept or reject the null Accept or reject the null hypothesishypothesis
Legal analogyLegal analogy Hypothesis testing Hypothesis testing
Legal trialLegal trial Hypothesis Hypothesis testtest
Defendant Defendant assumed assumed innocent until innocent until proved guiltyproved guilty
Null Null hypothesis hypothesis assumes no assumes no difference difference between between groupsgroups
Legal analogyLegal analogy Hypothesis testing Hypothesis testing
Legal trialLegal trial Hypothesis Hypothesis testtest
Examine Examine evidenceevidence
Calculate test Calculate test statistic based statistic based on evidence on evidence from sample from sample datadata
Legal analogyLegal analogy Hypothesis testing Hypothesis testing
Legal trialLegal trial Hypothesis Hypothesis testtest
1.Accept evidence 1.Accept evidence proves guiltproves guilt
2.Evidence does 2.Evidence does not prove guilt ‘not not prove guilt ‘not proven’proven’
1.1. Accept significant Accept significant difference difference between groupsbetween groups
2.2. Insufficient Insufficient evidence to reject evidence to reject HH00
Legal Analogy Legal Analogy Hypothesis Hypothesis
TestingTesting
No statistical No statistical significance significance notnot same as same as
No differenceNo difference
Illustrations Ian Christie, Orthopaedic & Trauma Surgery, Copyright 2002 University of Dundee
Statistical Inference: Statistical Inference: Hypothesis testingHypothesis testing
The test statistic generally consists The test statistic generally consists of: of:
Summary statistic – HSummary statistic – H00 value valueStandard error of summaryStandard error of summary
e.g.Test that the mean is different to e.g.Test that the mean is different to zero: zero:
t = t = Mean – 0Mean – 0 Se(Mean)Se(Mean)
The test statistic generally consists The test statistic generally consists of: of:
Summary statistic – HSummary statistic – H00 value valueStandard error of summaryStandard error of summary
e.g.Test that the mean is different to e.g.Test that the mean is different to zero: zero:
t = t = Mean – 0Mean – 0 Se(Mean)Se(Mean)
Statistical Inference: Statistical Inference: Hypothesis testingHypothesis testing
The test statistic is then compared with The test statistic is then compared with tabulated values of a distribution (e.g tabulated values of a distribution (e.g Normal distribution, t-distribution) Normal distribution, t-distribution)
Assuming the null hypothesis is true, what Assuming the null hypothesis is true, what is the probability of obtaining the actual is the probability of obtaining the actual observed value of the test statistic, t?observed value of the test statistic, t?
How likely is the value of t, to have How likely is the value of t, to have occurred by chance alone?occurred by chance alone?
The test statistic is then compared with The test statistic is then compared with tabulated values of a distribution (e.g tabulated values of a distribution (e.g Normal distribution, t-distribution) Normal distribution, t-distribution)
Assuming the null hypothesis is true, what Assuming the null hypothesis is true, what is the probability of obtaining the actual is the probability of obtaining the actual observed value of the test statistic, t?observed value of the test statistic, t?
How likely is the value of t, to have How likely is the value of t, to have occurred by chance alone?occurred by chance alone?
Statistical Inference: Statistical Inference: Hypothesis testingHypothesis testing
Assuming the null hypothesis is true, Assuming the null hypothesis is true, what is the probability of obtaining what is the probability of obtaining the actual observed or greater value the actual observed or greater value of the test statistic, t?of the test statistic, t?
Using distribution Using distribution Of t which is similar to a Of t which is similar to a Normal distributionNormal distributionthis probability can be this probability can be Obtained in figure as Obtained in figure as p = 0.042p = 0.042
Assuming the null hypothesis is true, Assuming the null hypothesis is true, what is the probability of obtaining what is the probability of obtaining the actual observed or greater value the actual observed or greater value of the test statistic, t?of the test statistic, t?
Using distribution Using distribution Of t which is similar to a Of t which is similar to a Normal distributionNormal distributionthis probability can be this probability can be Obtained in figure as Obtained in figure as p = 0.042p = 0.042
2.1% 2.1%
Statistical Inference: Statistical Inference: Hypothesis testingHypothesis testing
If probability of the occurrence of the observed value < 5% or If probability of the occurrence of the observed value < 5% or p < 0.05 then this is unlikely to be a chance findingp < 0.05 then this is unlikely to be a chance finding
Result is declared Result is declared statistically significantstatistically significant
Fortunately most statistical software (e.g. SPSS) will carry out Fortunately most statistical software (e.g. SPSS) will carry out the test you request and give p-values (SPSS labels as ‘Sig’)the test you request and give p-values (SPSS labels as ‘Sig’)
If probability of the occurrence of the observed value < 5% or If probability of the occurrence of the observed value < 5% or p < 0.05 then this is unlikely to be a chance findingp < 0.05 then this is unlikely to be a chance finding
Result is declared Result is declared statistically significantstatistically significant
Fortunately most statistical software (e.g. SPSS) will carry out Fortunately most statistical software (e.g. SPSS) will carry out the test you request and give p-values (SPSS labels as ‘Sig’)the test you request and give p-values (SPSS labels as ‘Sig’)
Two group hypothesis Two group hypothesis testingtesting
We will consider three common tests:We will consider three common tests:1.1.t-test for difference between two meanst-test for difference between two means
2.2.Chi-squared test (Chi-squared test (2) for difference between two ) for difference between two proportionsproportions
3.3. Logrank test for difference between two groups median Logrank test for difference between two groups median survivalsurvival
All are easily carried out in SPSSAll are easily carried out in SPSS
We will consider three common tests:We will consider three common tests:1.1.t-test for difference between two meanst-test for difference between two means
2.2.Chi-squared test (Chi-squared test (2) for difference between two ) for difference between two proportionsproportions
3.3. Logrank test for difference between two groups median Logrank test for difference between two groups median survivalsurvival
All are easily carried out in SPSSAll are easily carried out in SPSS
Are practices with access to Are practices with access to community hospitals further community hospitals further
away on average from general away on average from general hospitals?hospitals?
No accessNo access Access to CHAccess to CH
n=17n=17
Mean = 8.68 kmMean = 8.68 km
SD = 11.90 kmSD = 11.90 km
Se (mean) = 2.89Se (mean) = 2.89
n=10n=10
Mean = 21.30 kmMean = 21.30 km
SD = 5.68 kmSD = 5.68 km
Se (mean) = 1.79Se (mean) = 1.79
Example t-testExample t-test
1 and 2 refer to the two groups1 and 2 refer to the two groups
N is the number in each groupN is the number in each groupX bar refers to the mean and bar refers to the mean and
ssp p is the pooled standard deviationis the pooled standard deviation
1 and 2 refer to the two groups1 and 2 refer to the two groups
N is the number in each groupN is the number in each groupX bar refers to the mean and bar refers to the mean and
ssp p is the pooled standard deviationis the pooled standard deviation
21p
21
n/1+n/1s
0)xx(=t
--
Example t-testExample t-test
tt = -12.62/ 4.024= -12.62/ 4.024tt = -3.13= -3.13
With 25 degrees of freedom from t-tables With 25 degrees of freedom from t-tables p = 0.004 and so the difference of p = 0.004 and so the difference of 12.62 is highly statistically significant12.62 is highly statistically significant
tt = -12.62/ 4.024= -12.62/ 4.024tt = -3.13= -3.13
With 25 degrees of freedom from t-tables With 25 degrees of freedom from t-tables p = 0.004 and so the difference of p = 0.004 and so the difference of 12.62 is highly statistically significant12.62 is highly statistically significant
0.398*10.1120-21.30)-(8.68
=t
Consider a recent RCTConsider a recent RCT
• Rimonabant vs. placebo to reduce body weight in obese people Rimonabant vs. placebo to reduce body weight in obese people (BMI > 30kg/m(BMI > 30kg/m22))
• Rimonabant (20 mg daily) inhibits affects of cannabinoid Rimonabant (20 mg daily) inhibits affects of cannabinoid agonists which in turn affects energy balanceagonists which in turn affects energy balance
• Mean reduction in body weight at one year was 6.6kg vs. 1.8 kg Mean reduction in body weight at one year was 6.6kg vs. 1.8 kg (rimonab vs. plac)(rimonab vs. plac)
• Difference was 4.7 kg (95% CI 4.1, 5.4)Difference was 4.7 kg (95% CI 4.1, 5.4)• By end of year 2 mean weight was back to start!By end of year 2 mean weight was back to start!
• Rimonabant vs. placebo to reduce body weight in obese people Rimonabant vs. placebo to reduce body weight in obese people (BMI > 30kg/m(BMI > 30kg/m22))
• Rimonabant (20 mg daily) inhibits affects of cannabinoid Rimonabant (20 mg daily) inhibits affects of cannabinoid agonists which in turn affects energy balanceagonists which in turn affects energy balance
• Mean reduction in body weight at one year was 6.6kg vs. 1.8 kg Mean reduction in body weight at one year was 6.6kg vs. 1.8 kg (rimonab vs. plac)(rimonab vs. plac)
• Difference was 4.7 kg (95% CI 4.1, 5.4)Difference was 4.7 kg (95% CI 4.1, 5.4)• By end of year 2 mean weight was back to start!By end of year 2 mean weight was back to start!
Are practices with access to Are practices with access to community hospitals more likely community hospitals more likely
to have training status?to have training status?
12 (71%)12 (71%) 4 (40%)4 (40%)
5 (29%)5 (29%) 6 (60%)6 (60%)
No Yes
No Training Status
Training status
Community Hospital
Are practices with access to Are practices with access to community hospitals more likely community hospitals more likely
to have training status?to have training status?
•Is the difference in proportions Is the difference in proportions
60% - 29% = 31% well within the 60% - 29% = 31% well within the realms of chance or a statistically realms of chance or a statistically significant finding?significant finding?
•Null hypothesis Difference = 0Null hypothesis Difference = 0
• Use chi-squared (Use chi-squared (22) test for ) test for significance of differencesignificance of difference
Pearson Chi-Squared Test
No Yes
No training status
Training status
a b
c d
Comm. Hosp.
a+c b+d
a+b
c+dN
Pearson Chi-Squared Test
where N = a+b+c+d and |ad – bc| means take the positive value of the calculation
dbcadcba
bcadN2
2
Pearson Chi-Squared Test
= 2.44 with 1 degree of freedom
df = (no. rows – 1) x (no. columns – 1)
P = 0.118 which is not statistically significant
10171116
2072272
2
More complicated More complicated analysesanalyses
• Introduced simple two-group testsIntroduced simple two-group tests• Results of more complicated analyses are expressed in the same wayResults of more complicated analyses are expressed in the same way• Summary statistic and 95% confidence intervalSummary statistic and 95% confidence interval• Usually p-value is also stated but often implicit from the confidence intervalUsually p-value is also stated but often implicit from the confidence interval• Beware spurious significance e.g. Beware spurious significance e.g. • p = 0.034729 (3 d.p. are enough)p = 0.034729 (3 d.p. are enough)• ‘‘Importance’ refers to size of differenceImportance’ refers to size of difference• An ‘important’ result can be statistically non-significantAn ‘important’ result can be statistically non-significant
• Introduced simple two-group testsIntroduced simple two-group tests• Results of more complicated analyses are expressed in the same wayResults of more complicated analyses are expressed in the same way• Summary statistic and 95% confidence intervalSummary statistic and 95% confidence interval• Usually p-value is also stated but often implicit from the confidence intervalUsually p-value is also stated but often implicit from the confidence interval• Beware spurious significance e.g. Beware spurious significance e.g. • p = 0.034729 (3 d.p. are enough)p = 0.034729 (3 d.p. are enough)• ‘‘Importance’ refers to size of differenceImportance’ refers to size of difference• An ‘important’ result can be statistically non-significantAn ‘important’ result can be statistically non-significant
Sacred 5% levelSacred 5% level
• 5% level is arbitrary 5% level is arbitrary • Practical choice before computer era to Practical choice before computer era to
make tables easier to constructmake tables easier to construct• Are p = 0.046 and p = 0.051 different?Are p = 0.046 and p = 0.051 different?• In past researchers tended to only In past researchers tended to only
present p-valuespresent p-values• Now emphasis is on size of effect and Now emphasis is on size of effect and
95% CI95% CI• Unfortunately, Editors still influenced by Unfortunately, Editors still influenced by
p-values leading to publication biasp-values leading to publication bias
• 5% level is arbitrary 5% level is arbitrary • Practical choice before computer era to Practical choice before computer era to
make tables easier to constructmake tables easier to construct• Are p = 0.046 and p = 0.051 different?Are p = 0.046 and p = 0.051 different?• In past researchers tended to only In past researchers tended to only
present p-valuespresent p-values• Now emphasis is on size of effect and Now emphasis is on size of effect and
95% CI95% CI• Unfortunately, Editors still influenced by Unfortunately, Editors still influenced by
p-values leading to publication biasp-values leading to publication bias
SummarySummary
• Do not get carried away by p-Do not get carried away by p-valuesvalues
• Interpretation requires knowledge Interpretation requires knowledge of area to put into context, but also of area to put into context, but also understanding of what the tests do understanding of what the tests do
• A p-value close to 5% is A p-value close to 5% is approaching significance and may approaching significance and may suggest it is worth investigatingsuggest it is worth investigating
• The size of the effect is more The size of the effect is more clinically or scientifically important clinically or scientifically important
• Do not get carried away by p-Do not get carried away by p-valuesvalues
• Interpretation requires knowledge Interpretation requires knowledge of area to put into context, but also of area to put into context, but also understanding of what the tests do understanding of what the tests do
• A p-value close to 5% is A p-value close to 5% is approaching significance and may approaching significance and may suggest it is worth investigatingsuggest it is worth investigating
• The size of the effect is more The size of the effect is more clinically or scientifically important clinically or scientifically important