Bread and butter statistics: RCGP Curriculum Statement 3.5: Evidence-Based Practice

1

Bread and butter statistics:

RCGP Curriculum Statement 3.5: Evidence-Based

PracticeVTS 02/02/2011

2

Audit - definition Research – definition Bias Blinding Confidence intervals Forest plot L’Abbé plot Hypothesis Null hypothesis Incidence Prevalence Normal distribution

Parameter Statistic Variable P-value Number needed to treat Number needed to harm Statistical power Sensitivity Positive predictive value Specificity Reliability Validity

Possible topics for today

3

Useful Websites http://www.medicine.ox.ac.uk/bandolier

http://www.cebm.net

http://www.library.nhs.uk/Default.aspx

http://cochrane.co.uk/en/clib.html

http://dtb.bmj.com/

http://www.nice.org.uk/

http://www.medicine.ox.ac.uk/bandolier

http://www.cebm.net/

http://www.library.nhs.uk/Default.aspx

http://cochrane.co.uk/en/clib.html

http://dtb.bmj.com/




4

Topics for today - 1

Audit - definition Research – definition Bias Blinding Confidence intervals Forest Plot

5


L’Abbé plot Hypothesis Null hypothesis Incidence Prevalence Normal distribution

6


Parameter Statistic Variable P-value Number needed to treat Number needed to harm

7


Statistical power Sensitivity Positive predictive value Specificity Reliability Validity

8

Audit – definition Clinical audit is a quality improvement process It seeks to improve patient care and outcomes by

systematic review of care against explicit criteria and the implementation of change

Aspects of the structure, processes, and outcomes of care are selected and systematically evaluated against explicit criteria

Where indicated, changes are implemented at an individual, team or service level and further monitoring is used to confirm improvement in healthcare delivery

from NICE

9

The Audit cycle Identify the need for change

Problems may be identified in 3 areas: Structure, Process, Outcome

Set Criteria and Standards - what should be happening

Collect data on performance

Assess performance against criteria and standards

Identify changes needed

10

The Audit cycle

11

Research - definition Research is an ORGANISED and SYSTEMATIC way of FINDING

ANSWERS to QUESTIONS

SYSTEMATIC - certain things are always done in research in order to get the most accurate results

ORGANISED - there is a structure or method in doing research. It is a planned procedure, focused and limited to a specific scope

FINDING ANSWERS is the aim of all research. Whether it is the answer to a hypothesis a question, research is successful when answers are found even if they are negative

QUESTIONS are central to research. Research is focused on relevant, useful, and important questions. Without a question research has no purpose

12

Bias Dictionary definition - 'a one-sided inclination of the mind'. It

defines a systematic tendency of certain trial designs to produce results consistently better or worse than other designs

In studies of the effects of health care bias can arise from: systematic differences in the groups compared (selection bias) the care that is provided, or exposure to other factors apart from

the intervention of interest (performance bias) withdrawals or exclusions of people entered into the study

(attrition bias) how outcomes are assessed (detection/observer bias)

This use of bias does not necessarily imply any prejudice, such as the investigators' desire for particular results, which differs from the conventional use of the word meaning a partisan point of view

13

Blinding Participants, investigators and/or assessors do not know

which treatments participants are receiving. Lack of blinding is a potent source of bias, and open studies or single-blind studies have potential problems for interpreting results

In a single blind study participants may be blind to their allocations, or those who are making measurements of interest, the assessors. In a double blind study, both participants and assessors are blind to the allocations

To achieve a double-blind state, it is usual to use matching treatment and control treatments, e.g. active and placebo tablets can be made to look the same

14

Blinding If treatments are radically different (e.g. tablets compared with

injection) a double-dummy technique may be used where all patients receive both an injection and a tablet to maintain blinding

Concealment of allocation - the process used to prevent foreknowledge of group assignment in a randomised controlled trial, which is distinct from blinding. This process should be impervious to any influence by the individual making the allocation.

Adequate methods of allocation concealment include: centralized randomisation schemes; numbered or coded containers in which capsules from identical-looking, numbered bottles are administered sequentially; sequentially numbered opaque, sealed envelopes

15

Confidence intervals Quantifies the uncertainty in measurement.

Usually reported as 95% CI, = the range of values within which we can be 95% sure that the true value lies

For example, for an NNT of 10 with a 95% CI of 5 and 15, there is 95% confidence that the true NNT value was between 5 and 15

16

Confidence intervals Confidence intervals are preferable to p-values, as

they tell us the range of possible effect sizes compatible with the data

A confidence interval that includes the value of no difference indicates that the treatment under investigation is not significantly different from the control

Confidence intervals aid interpretation of clinical trial data by putting upper and lower limits on the likely size of any true effect

17

Confidence intervals Bias must be assessed before confidence intervals can be

interpreted. Even very large samples and very narrow confidence intervals can mislead if they come from biased studies

Non-significance does not mean no effect. Small studies may report non-significance even when there are important, real effects

Statistical significance does not necessarily mean that the effect is real: by chance alone about one in 20 significant findings will be spurious

Statistical significance does not necessarily mean clinically important - the size of the effect determines the importance

18

Forest (meta-analysis) plot In a Forest plot, the results of component studies are shown as

squares centred on the point estimate of the result of each study

A horizontal line runs through the square to show its confidence interval usually, but not always, a 95% confidence interval

The overall estimate from the meta-analysis and its confidence interval are put at the bottom, represented as a diamond. The centre of the diamond represents the pooled point estimate, and its horizontal tips represent the confidence interval

Significance is achieved at the set level if the diamond is clear of

the line of no effect

19

Forest (meta-analysis) plot The plot allows readers to see the information from the

individual studies which went into the meta-analysis at a glance

It provides a simple visual representation of the amount of variation between the results of the studies, as well as an estimate of the overall result of all the studies together

20

Meta-analysis of effect of beta blockers on mortality after myocardial infarction

Lewis and Ellis, 1982

21

In the modern format ~

Back to top

22

L'Abbé plot A simple scatter plot which

can yield a comprehensive qualitative view of the data

If the experimental treatment is better than the control the point will lie in the upper left of the plot, between the y axis and the line of equality

If experimental is no better than control then the point will fall on the line of equality), and if control is better than experimental then the point will be in the lower right of the plot, between the x axis and the line of equality

23

L'Abbé plot Visual inspection gives a quick and

easy indication of the level of agreement among trials

L'Abbé plots are becoming widely used

They have several benefits: the simple visual presentation is easy to assimilate. They make us think about the reasons why there can be such wide variation in responses. They explain the need for placebo controls. They keep us sceptical about overly good or bad results

Figure: Trazodone for erectile dysfunction in psychogenic erectile dysfunction (dark symbols) and with physiological or mixed aetiology (light symbols

24

Hypothesis “A tentative supposition with regard to an unknown state of

affairs, the truth of which is … subject to investigation by any available method, either by logical deduction of consequences which may be checked against what is known, or by direct experimental investigation or discovery of facts not hitherto known and suggested by the hypothesis”

“A proposition presented as a supposition rather than asserted. A hypothesis may be put forward for testing or for discussion, possibly as a prelude to acceptance or rejection”

“Treating hypertension reduces myocardial infarction rate”. “Treating sore throat with penicillin reduces the rate of glomerulonephritis”. “Osteopathy is a good treatment for mechanical low back pain”. Etc.

25

Null hypothesis

“The statistical hypothesis that one variable (e.g. whether or not a study participant was allocated to receive an intervention) has no association with another variable or set of variables (e.g. whether or not a study participant died), or that two or more population distributions do not differ from one another”.

In its simplest terms, the null hypothesis states

that the results observed in a study are no different from what might have occurred as a result of chance

26

Incidence The proportion of new cases of the target disorder

in the population at risk during a specified time interval

It is usual to define the disorder, and the population, and the time, and report the incidence as a rate

Statement of the sort: Most developed countries with northern European age structure have an incidence of Parkinson’s disease of between 12 and 20 cases per 100 000 per year

27

Prevalence This is a measure of the proportion of people in a population who

have a disease at a point in time, or over some period of time

E.g. there was a study of incidence and prevalence of MS in the Lothian and Border region of Scotland in the mid-1990s, with a population of about 864,000.

Annual incidence was 12 per 100,000. If probable cases were included also, the rate rose to 18 per 100,000

Prevalence was determined by defining a prevalent case as any person with a diagnosis of multiple sclerosis alive and normally resident in the area on 15 March 1995. Probable as well as definite cases were included. There were 1613 residents with a diagnosis of MS, giving a crude prevalence rate of 187/100,000. The sex ratio was 2.5 and the mean age was 49 years.

28

Normal distribution Normal distributions are a family of distributions that have the same

general shape They are symmetrical with scores more concentrated in the middle

than in the tails Normal distributions are sometimes described as bell shaped Also called Gaussian distribution Can be manipulated mathematically Similar to the pattern of distribution of naturally occurring phenomena

All normal density curves satisfy the following property, often referred to as the Empirical Rule:

67.7% of the observations fall within 1 standard deviation of the mean 95% of the observations fall within 2 standard deviations of the mean 99.7% of the observations fall within 3 standard deviations of the mean

I.e. for a normal distribution, almost all values lie within 3 standard deviations of the mean

Back to top

29

Parameter A parameter is a number computed from a

population. This contrasts with the definition of a statistic

A parameter is a constant, unchanging value. There is no random variation in a parameter. If the size of the population is large (as is typically the case), then a parameter may be difficult to compute

An example of a parameter would be: the average length of stay in the birth

hospital for all infants born in the UK

30

Statistic A statistic is a number computed from a

sample. This contrasts this with the definition of a parameter

If a statistic is computed from a random sample (typically the case), then it has random variation or sampling error

An example of a statistic would be: the average length of stay in the birth

hospital for a random sample of 387 infants born in BHRUT

31

Variable A measurement that can vary within a

study, e.g. the age of participants

Variability is present when differences can be seen between different people or within the same person over time, with respect to any characteristic or feature that can be assessed or measured

32

P-value The probability (ranging from zero to one) that the

results observed in a study could have occurred by chance

Calculated using a statistical test

Convention is to accept a p-value of 0.05 (5%) or below as being statistically significant. Equivalent to a chance of random results of 1 in 20, which is not very unlikely. No solid mathematical basis, it was just chosen many years ago

When many comparisons are being made, statistical significance can occur just by chance. A more stringent rule is to use a p-value of 0.01 (1 in 100) or below as statistically significant

33

Number needed to treat (NNT) The inverse of the absolute risk reduction, the

number of patients that need to be treated for one to benefit compared with a control

The ideal NNT is 1, where everyone improves with treatment and no-one does with control. The higher the NNT, the less effective is the treatment

The value of an NNT is not just numeric - NNTs of 2-5 are indicative of effective therapies, like analgesics for acute pain

NNTs of about 1 might be seen in treating sensitive bacterial infections with antibiotics, while an NNT of 40 or more might be useful e.g. when using aspirin after a heart attack

34

Calculating NNT NNT = 1/ARR

ARR = (CER – EER) where CER = control group event rate and EER = experimental group event rate

Sample Calculation: The results of the Diabetes Control and Complications Trial into the effect of intensive diabetes therapy on the development and progression of neuropathy indicated that neuropathy occurred in 9.6% of patients randomised to usual care and 2.8% of patients randomised to intensive therapy.

NNT with intensive diabetes therapy to prevent one additional occurrence of neuropathy can be determined by calculating the absolute risk reduction as follows:

ARR = (CER – EER) = (9.6% - 2.8%) = 6.8%NNT = 1/ARR = 1/0.068 = 14.7 or 15

Therefore need to treat 15 diabetic patients with intensive therapy to prevent one from developing neuropathy

35

Number needed to treat Response to antibiotics of women with symptoms of UTI

but negative dipstick urine test results: double blind RCT. Richards et al, BMJ 2005;331:143-6. To reduce duration of symptoms by 2 days?

4

Antibiotic prescribing in GP and hospital admissions for peritonsillar abscess, mastoiditis and rheumatic fever in children: time trend analysis. Sharland et al, BMJ 2005, 331, 328-9. To prevent one case of mastoiditis?

At least 2500

Trigeminal neuralgia Rx anticonvulsants. To obtain 50% pain relief?

2.5

36

Number needed to treat Arthritis Rx glucosamine for 3-8/52 cf. placebo. To improve

symptoms? 5 So why not prescribe it?

http://www.nice.org.uk/nicemedia/pdf/CG59NICEguideline.pdf

MRC trial of treatment of mild HT: principal results. 17,354 individuals aged 36-64 years with diastolic 90-109 mmHg Rx benzoflurazide and propranolol for 5.5 years cf. placebo. BMJ 1985 291: 97-104. Primary prevention of one stroke at one year?

850

Benign prostatic hypertrophy Rx finasteride for 2 years vs placebo. To prevent one operation?

39

http://www.nice.org.uk/nicemedia/pdf/CG59NICEguideline.pdf

37

Number needed to harm (NNH)

This is calculated in the same way as for NNT, but is used to describe adverse events

For NNH, large numbers are good, because they mean that adverse events are rare.

Small values for NNH are bad, because they mean adverse events are common

38

Number needed to harm An example of how NNH can be calculated with NNT is

that of inhaled corticosteroids used for asthma (Powell & Gibson. Inhaled corticosteroid doses in asthma: an evidence-based approach. Medical Journal of Australia 2003 178: 223-225).

At low daily doses of 100 or 200 μg/day, neither dysphonia nor oral candidiasis was much of a problem, affecting about an additional one person per hundred treated, with NNTs of about 100

At daily doses of 500 μg and above, numbers needed to harm (NNH) fell to levels of about 20 or below, indicating that for every 100 patients treated with these doses, about an additional 5 would experience dysphonia and 5 would experience oral candidiasis

Back to top

39

Statistical power The ability of a study to demonstrate an

association or causal relationship between two variables, given that an association exists

For example, 80% power in a clinical trial means that the study has a 80% chance of ending up with a p value of less than 5% in a statistical test (i.e. a statistically significant treatment effect) if there really was an important difference (e.g. 10% versus 5% mortality) between treatments

If the statistical power of a study is low, the study results will be questionable e.g. the study might have been too small to detect any differences

40

Statistical power Factors influencing power in a statistical test

include: What kind of statistical test is being performed. Some

statistical tests are inherently more powerful than others

Sample size. In general, the larger the sample size, the larger the power. However, generally increasing sample size involves costs in time, money, and effort. It is therefore important to make sample size "large enough," but not wastefully large.

The size of experimental effects. The level of error in experimental measurements.

By convention, 80% is an acceptable level of power

41

Sensitivity Proportion of people with the target disorder who have a

positive test/symptom/sign. Used to assist in assessing and selecting a diagnostic test/symptom/sign

A seNsitive test keeps false-Negatives down – 100% sensitive means all with positive tests have the condition

SnNout:When a sign/test/symptom has a high Sensitivity a Negative result rules out the diagnosis.

For example, the sensitivity of a history of ankle swelling for diagnosing ascites is 93%; if then a person does not have a history of ankle swelling, it is highly unlikely that they have ascites

42

Specificity

Proportion of people without the target disorder who have a negative test. It is used to assist in assessing and selecting a diagnostic test/symptom/sign

A sPecific test keeps false-Positives down – 100% specific means all with negative tests do not have the condition

SpPin: When a sign/test/symptom has a high Specificity a Positive result rules in the diagnosis.

For example , the specificity of a fluid wave for diagnosing ascites is 92%; therefore if a person does have a fluid wave, it rules in the diagnosis of ascites

43

Specificity and Sensitivity are closely related to the measures of:

Positive Predictive Value: The proportion of people with a positive test who have the target disorder; and

Negative Predictive Value: The

proportion of people with a negative test who do not have the target disorder.

Calculations from the table:

sensitivity = a/(a+c)specificity = d/(b+d)

positive predictive value = a/(a+b)negative predictive value = d/(c+d)

Positive predictive value

44

Reliability

Reproducibility Stability over time and place Ease of replication Minimisation of observer variation Confirmation of results

45

Validity This term is a difficult concept in clinical

trials, but refers to a trial being able to measure what it sets out to measure

A trial that set out to measure the analgesic effect of a procedure might be in trouble if patients had no pain

Or in a condition where treatment is life-long, evaluating an intervention for 10 minutes is inappropriate

Back to top

Bread and butter statistics: RCGP Curriculum Statement 3.5: Evidence-Based Practice

Documents

Bread and butter statistics: RCGP Curriculum Statement 3.5: Evidence-Based Practice