1
Bread and butter statistics:
RCGP Curriculum Statement 3.5: Evidence-Based
PracticeVTS 02/02/2011
2
Audit - definition Research – definition Bias Blinding Confidence intervals Forest plot L’Abbé plot Hypothesis Null hypothesis Incidence Prevalence Normal distribution
Parameter Statistic Variable P-value Number needed to treat Number needed to harm Statistical power Sensitivity Positive predictive value Specificity Reliability Validity
Possible topics for today
3
Useful Websites http://www.medicine.ox.ac.uk/bandolier
http://www.cebm.net
http://www.library.nhs.uk/Default.aspx
http://cochrane.co.uk/en/clib.html
http://dtb.bmj.com/
http://www.nice.org.uk/
4
Topics for today - 1
Audit - definition Research – definition Bias Blinding Confidence intervals Forest Plot
5
Topics for today - 2
L’Abbé plot Hypothesis Null hypothesis Incidence Prevalence Normal distribution
6
Topics for today - 3
Parameter Statistic Variable P-value Number needed to treat Number needed to harm
7
Topics for today - 4
Statistical power Sensitivity Positive predictive value Specificity Reliability Validity
8
Audit – definition Clinical audit is a quality improvement process It seeks to improve patient care and outcomes by
systematic review of care against explicit criteria and the implementation of change
Aspects of the structure, processes, and outcomes of care are selected and systematically evaluated against explicit criteria
Where indicated, changes are implemented at an individual, team or service level and further monitoring is used to confirm improvement in healthcare delivery
from NICE
9
The Audit cycle Identify the need for change
Problems may be identified in 3 areas: Structure, Process, Outcome
Set Criteria and Standards - what should be happening
Collect data on performance
Assess performance against criteria and standards
Identify changes needed
10
The Audit cycle
11
Research - definition Research is an ORGANISED and SYSTEMATIC way of FINDING
ANSWERS to QUESTIONS
SYSTEMATIC - certain things are always done in research in order to get the most accurate results
ORGANISED - there is a structure or method in doing research. It is a planned procedure, focused and limited to a specific scope
FINDING ANSWERS is the aim of all research. Whether it is the answer to a hypothesis a question, research is successful when answers are found even if they are negative
QUESTIONS are central to research. Research is focused on relevant, useful, and important questions. Without a question research has no purpose
12
Bias Dictionary definition - 'a one-sided inclination of the mind'. It
defines a systematic tendency of certain trial designs to produce results consistently better or worse than other designs
In studies of the effects of health care bias can arise from: systematic differences in the groups compared (selection bias) the care that is provided, or exposure to other factors apart from
the intervention of interest (performance bias) withdrawals or exclusions of people entered into the study
(attrition bias) how outcomes are assessed (detection/observer bias)
This use of bias does not necessarily imply any prejudice, such as the investigators' desire for particular results, which differs from the conventional use of the word meaning a partisan point of view
13
Blinding Participants, investigators and/or assessors do not know
which treatments participants are receiving. Lack of blinding is a potent source of bias, and open studies or single-blind studies have potential problems for interpreting results
In a single blind study participants may be blind to their allocations, or those who are making measurements of interest, the assessors. In a double blind study, both participants and assessors are blind to the allocations
To achieve a double-blind state, it is usual to use matching treatment and control treatments, e.g. active and placebo tablets can be made to look the same
14
Blinding If treatments are radically different (e.g. tablets compared with
injection) a double-dummy technique may be used where all patients receive both an injection and a tablet to maintain blinding
Concealment of allocation - the process used to prevent foreknowledge of group assignment in a randomised controlled trial, which is distinct from blinding. This process should be impervious to any influence by the individual making the allocation.
Adequate methods of allocation concealment include: centralized randomisation schemes; numbered or coded containers in which capsules from identical-looking, numbered bottles are administered sequentially; sequentially numbered opaque, sealed envelopes
15
Confidence intervals Quantifies the uncertainty in measurement.
Usually reported as 95% CI, = the range of values within which we can be 95% sure that the true value lies
For example, for an NNT of 10 with a 95% CI of 5 and 15, there is 95% confidence that the true NNT value was between 5 and 15
16
Confidence intervals Confidence intervals are preferable to p-values, as
they tell us the range of possible effect sizes compatible with the data
A confidence interval that includes the value of no difference indicates that the treatment under investigation is not significantly different from the control
Confidence intervals aid interpretation of clinical trial data by putting upper and lower limits on the likely size of any true effect
17
Confidence intervals Bias must be assessed before confidence intervals can be
interpreted. Even very large samples and very narrow confidence intervals can mislead if they come from biased studies
Non-significance does not mean no effect. Small studies may report non-significance even when there are important, real effects
Statistical significance does not necessarily mean that the effect is real: by chance alone about one in 20 significant findings will be spurious
Statistical significance does not necessarily mean clinically important - the size of the effect determines the importance
18
Forest (meta-analysis) plot In a Forest plot, the results of component studies are shown as
squares centred on the point estimate of the result of each study
A horizontal line runs through the square to show its confidence interval usually, but not always, a 95% confidence interval
The overall estimate from the meta-analysis and its confidence interval are put at the bottom, represented as a diamond. The centre of the diamond represents the pooled point estimate, and its horizontal tips represent the confidence interval
Significance is achieved at the set level if the diamond is clear of
the line of no effect
19
Forest (meta-analysis) plot The plot allows readers to see the information from the
individual studies which went into the meta-analysis at a glance
It provides a simple visual representation of the amount of variation between the results of the studies, as well as an estimate of the overall result of all the studies together
20
Meta-analysis of effect of beta blockers on mortality after myocardial infarction
Lewis and Ellis, 1982
21
In the modern format ~
Back to top
22
L'Abbé plot A simple scatter plot which
can yield a comprehensive qualitative view of the data
If the experimental treatment is better than the control the point will lie in the upper left of the plot, between the y axis and the line of equality
If experimental is no better than control then the point will fall on the line of equality), and if control is better than experimental then the point will be in the lower right of the plot, between the x axis and the line of equality
23
L'Abbé plot Visual inspection gives a quick and
easy indication of the level of agreement among trials
L'Abbé plots are becoming widely used
They have several benefits: the simple visual presentation is easy to assimilate. They make us think about the reasons why there can be such wide variation in responses. They explain the need for placebo controls. They keep us sceptical about overly good or bad results
Figure: Trazodone for erectile dysfunction in psychogenic erectile dysfunction (dark symbols) and with physiological or mixed aetiology (light symbols
24
Hypothesis “A tentative supposition with regard to an unknown state of
affairs, the truth of which is … subject to investigation by any available method, either by logical deduction of consequences which may be checked against what is known, or by direct experimental investigation or discovery of facts not hitherto known and suggested by the hypothesis”
“A proposition presented as a supposition rather than asserted. A hypothesis may be put forward for testing or for discussion, possibly as a prelude to acceptance or rejection”
“Treating hypertension reduces myocardial infarction rate”. “Treating sore throat with penicillin reduces the rate of glomerulonephritis”. “Osteopathy is a good treatment for mechanical low back pain”. Etc.
25
Null hypothesis
“The statistical hypothesis that one variable (e.g. whether or not a study participant was allocated to receive an intervention) has no association with another variable or set of variables (e.g. whether or not a study participant died), or that two or more population distributions do not differ from one another”.
In its simplest terms, the null hypothesis states
that the results observed in a study are no different from what might have occurred as a result of chance
26
Incidence The proportion of new cases of the target disorder
in the population at risk during a specified time interval
It is usual to define the disorder, and the population, and the time, and report the incidence as a rate
Statement of the sort: Most developed countries with northern European age structure have an incidence of Parkinson’s disease of between 12 and 20 cases per 100 000 per year
27
Prevalence This is a measure of the proportion of people in a population who
have a disease at a point in time, or over some period of time
E.g. there was a study of incidence and prevalence of MS in the Lothian and Border region of Scotland in the mid-1990s, with a population of about 864,000.
Annual incidence was 12 per 100,000. If probable cases were included also, the rate rose to 18 per 100,000
Prevalence was determined by defining a prevalent case as any person with a diagnosis of multiple sclerosis alive and normally resident in the area on 15 March 1995. Probable as well as definite cases were included. There were 1613 residents with a diagnosis of MS, giving a crude prevalence rate of 187/100,000. The sex ratio was 2.5 and the mean age was 49 years.
28
Normal distribution Normal distributions are a family of distributions that have the same
general shape They are symmetrical with scores more concentrated in the middle
than in the tails Normal distributions are sometimes described as bell shaped Also called Gaussian distribution Can be manipulated mathematically Similar to the pattern of distribution of naturally occurring phenomena
All normal density curves satisfy the following property, often referred to as the Empirical Rule:
67.7% of the observations fall within 1 standard deviation of the mean 95% of the observations fall within 2 standard deviations of the mean 99.7% of the observations fall within 3 standard deviations of the mean
I.e. for a normal distribution, almost all values lie within 3 standard deviations of the mean
Back to top
29
Parameter A parameter is a number computed from a
population. This contrasts with the definition of a statistic
A parameter is a constant, unchanging value. There is no random variation in a parameter. If the size of the population is large (as is typically the case), then a parameter may be difficult to compute
An example of a parameter would be: the average length of stay in the birth
hospital for all infants born in the UK
30
Statistic A statistic is a number computed from a
sample. This contrasts this with the definition of a parameter
If a statistic is computed from a random sample (typically the case), then it has random variation or sampling error
An example of a statistic would be: the average length of stay in the birth
hospital for a random sample of 387 infants born in BHRUT
31
Variable A measurement that can vary within a
study, e.g. the age of participants
Variability is present when differences can be seen between different people or within the same person over time, with respect to any characteristic or feature that can be assessed or measured
32
P-value The probability (ranging from zero to one) that the
results observed in a study could have occurred by chance
Calculated using a statistical test
Convention is to accept a p-value of 0.05 (5%) or below as being statistically significant. Equivalent to a chance of random results of 1 in 20, which is not very unlikely. No solid mathematical basis, it was just chosen many years ago
When many comparisons are being made, statistical significance can occur just by chance. A more stringent rule is to use a p-value of 0.01 (1 in 100) or below as statistically significant
33
Number needed to treat (NNT) The inverse of the absolute risk reduction, the
number of patients that need to be treated for one to benefit compared with a control
The ideal NNT is 1, where everyone improves with treatment and no-one does with control. The higher the NNT, the less effective is the treatment
The value of an NNT is not just numeric - NNTs of 2-5 are indicative of effective therapies, like analgesics for acute pain
NNTs of about 1 might be seen in treating sensitive bacterial infections with antibiotics, while an NNT of 40 or more might be useful e.g. when using aspirin after a heart attack
34
Calculating NNT NNT = 1/ARR
ARR = (CER – EER) where CER = control group event rate and EER = experimental group event rate
Sample Calculation: The results of the Diabetes Control and Complications Trial into the effect of intensive diabetes therapy on the development and progression of neuropathy indicated that neuropathy occurred in 9.6% of patients randomised to usual care and 2.8% of patients randomised to intensive therapy.
NNT with intensive diabetes therapy to prevent one additional occurrence of neuropathy can be determined by calculating the absolute risk reduction as follows:
ARR = (CER – EER) = (9.6% - 2.8%) = 6.8%NNT = 1/ARR = 1/0.068 = 14.7 or 15
Therefore need to treat 15 diabetic patients with intensive therapy to prevent one from developing neuropathy
35
Number needed to treat Response to antibiotics of women with symptoms of UTI
but negative dipstick urine test results: double blind RCT. Richards et al, BMJ 2005;331:143-6. To reduce duration of symptoms by 2 days?
4
Antibiotic prescribing in GP and hospital admissions for peritonsillar abscess, mastoiditis and rheumatic fever in children: time trend analysis. Sharland et al, BMJ 2005, 331, 328-9. To prevent one case of mastoiditis?
At least 2500
Trigeminal neuralgia Rx anticonvulsants. To obtain 50% pain relief?
2.5
36
Number needed to treat Arthritis Rx glucosamine for 3-8/52 cf. placebo. To improve
symptoms? 5 So why not prescribe it?
http://www.nice.org.uk/nicemedia/pdf/CG59NICEguideline.pdf
MRC trial of treatment of mild HT: principal results. 17,354 individuals aged 36-64 years with diastolic 90-109 mmHg Rx benzoflurazide and propranolol for 5.5 years cf. placebo. BMJ 1985 291: 97-104. Primary prevention of one stroke at one year?
850
Benign prostatic hypertrophy Rx finasteride for 2 years vs placebo. To prevent one operation?
39
37
Number needed to harm (NNH)
This is calculated in the same way as for NNT, but is used to describe adverse events
For NNH, large numbers are good, because they mean that adverse events are rare.
Small values for NNH are bad, because they mean adverse events are common
38
Number needed to harm An example of how NNH can be calculated with NNT is
that of inhaled corticosteroids used for asthma (Powell & Gibson. Inhaled corticosteroid doses in asthma: an evidence-based approach. Medical Journal of Australia 2003 178: 223-225).
At low daily doses of 100 or 200 μg/day, neither dysphonia nor oral candidiasis was much of a problem, affecting about an additional one person per hundred treated, with NNTs of about 100
At daily doses of 500 μg and above, numbers needed to harm (NNH) fell to levels of about 20 or below, indicating that for every 100 patients treated with these doses, about an additional 5 would experience dysphonia and 5 would experience oral candidiasis
Back to top
39
Statistical power The ability of a study to demonstrate an
association or causal relationship between two variables, given that an association exists
For example, 80% power in a clinical trial means that the study has a 80% chance of ending up with a p value of less than 5% in a statistical test (i.e. a statistically significant treatment effect) if there really was an important difference (e.g. 10% versus 5% mortality) between treatments
If the statistical power of a study is low, the study results will be questionable e.g. the study might have been too small to detect any differences
40
Statistical power Factors influencing power in a statistical test
include: What kind of statistical test is being performed. Some
statistical tests are inherently more powerful than others
Sample size. In general, the larger the sample size, the larger the power. However, generally increasing sample size involves costs in time, money, and effort. It is therefore important to make sample size "large enough," but not wastefully large.
The size of experimental effects. The level of error in experimental measurements.
By convention, 80% is an acceptable level of power
41
Sensitivity Proportion of people with the target disorder who have a
positive test/symptom/sign. Used to assist in assessing and selecting a diagnostic test/symptom/sign
A seNsitive test keeps false-Negatives down – 100% sensitive means all with positive tests have the condition
SnNout:When a sign/test/symptom has a high Sensitivity a Negative result rules out the diagnosis.
For example, the sensitivity of a history of ankle swelling for diagnosing ascites is 93%; if then a person does not have a history of ankle swelling, it is highly unlikely that they have ascites
42
Specificity
Proportion of people without the target disorder who have a negative test. It is used to assist in assessing and selecting a diagnostic test/symptom/sign
A sPecific test keeps false-Positives down – 100% specific means all with negative tests do not have the condition
SpPin: When a sign/test/symptom has a high Specificity a Positive result rules in the diagnosis.
For example , the specificity of a fluid wave for diagnosing ascites is 92%; therefore if a person does have a fluid wave, it rules in the diagnosis of ascites
43
Specificity and Sensitivity are closely related to the measures of:
Positive Predictive Value: The proportion of people with a positive test who have the target disorder; and
Negative Predictive Value: The
proportion of people with a negative test who do not have the target disorder.
Calculations from the table:
sensitivity = a/(a+c)specificity = d/(b+d)
positive predictive value = a/(a+b)negative predictive value = d/(c+d)
Positive predictive value
44
Reliability
Reproducibility Stability over time and place Ease of replication Minimisation of observer variation Confirmation of results
45
Validity This term is a difficult concept in clinical
trials, but refers to a trial being able to measure what it sets out to measure
A trial that set out to measure the analgesic effect of a procedure might be in trouble if patients had no pain
Or in a condition where treatment is life-long, evaluating an intervention for 10 minutes is inappropriate
Back to top