Sample Size Calculation PD Dr. Rolf Lefering IFOM - Institut für Forschung in der Operativen Medizin Universität Witten/Herdecke Campus Köln-Merheim
Mar 26, 2015
Sample Size Calculation
PD Dr. Rolf Lefering
IFOM - Institut für Forschung in der Operativen MedizinUniversität Witten/HerdeckeCampus Köln-Merheim
sample size
Sample Size Calculation
uncertainty
costs & effort & time
Sample Size Calculation
Single study group
- continuous measurement
- count of events
Comparative trial (2 or more groups)
- continuous measurement
- count of events
Which true value is compatible with the observation?
Confidence interval
... range where the true value lies with a high probability (usually 95%)
Confidence Interval
Confidence Interval
Example:
56 patients with open fractures, 9 developed an infection (16%)
n=56
sample
all patients with open fractures
infection rate:
16%
true value ???
Formula for event rates
n = sample sizep = percentage
Example: n = 56p = 16%
CI95 = 16 +/- 1,96 * (16*84) / 56 = 16 +/- 9,6[ 6,4 - 25,6 ]
Confidence Interval
p * (100 - p) CI95 = P +/- 1,96 *
n
Confidence Interval
95% confidence interval around a 20% incidence rate
0
5
10
15
20
25
30
35
40
45
50
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
sample size
inci
den
ce r
ate
(%)
CI95 = M 1,96 * SE
Mean:
M = meanSE = standard errorSD = standard deviationn = sample size
Remember:
SE = SD / n
1,65 für 90%1,96 für 95%2,58 für 99%
Formula for continuous variables
Confidence Interval
Sample Size Calculation
Comparative trials
„What is the sample size to show that early weight-bearing therapy is better ?“
„Which key should I press here now ?“
„What is the sample size to show that early weight bearing therapy, as
compared to standard therapy, is able to reduce the time until return to work from 10 weeks to 8 weeks, where time
to work has a SD of 3 ?“
36 cases per group !
Outcome Measures
Wound infection
WellbeingPain
Sepsis
Beweglichkeit
Inedpemdence,autonomy
Hospital stayOrgan failure
Recurrencerate
Blood pressure
Fear
FatigueAnxietySocial
status
Lab values
Survival
Complications
Depressionen
• RelevanceDoes this endpoint convince the patient / the scientific community?
• Reliability; measurabilityCould the outcome easily be measured, without much variation, also by different people?
• Sensitivity Does the intervention lead to a significant change in the outcome measure?
• RobustnessHow much is the endpoint influenced by other factors?
Select Outcome Measure
• Primary endpoint
Main hypothesis or core question; aim of the studyStatistics: confirmative
• Secondary endpoints
Other interesting questions, additional endpoints Statistics: explorative
(could be confirmative in case of a large difference)
Advantage: prospective selection in the study protocol
• Retrospektively selected endpoints
Selected when the trial is done, based on subgroup differencesStatistics: ONLY explorative !
Select Outcome Measure
Sample size
Certainty - errorPower
Differenceto be detected
Sample Size Calculation
A statistical test
is a method (or tool) to decide whether an observed difference* is really present or just based on variation by chance
* this is true for a test for difference which is the most frequently applied one in medicine
Statistical Testing
Test for difference„Intervention A is better than B“
Test for equivalence„Intervention A and B have the same effect“
Test for non- inferiority„Intervention A is not worse than B“
Statistical Testing
How a test procedure works
1. Want to show: there is a difference
2. Assume: there is NO difference between the groups; („equal effects“, null-hypothesis)
3. Try to disprove this assumption:- perform study / experiment- measure the difference
4. Calculate: the probability that such a difference could occur although the assumption („no difference“) was true
= p-value
Statistical Testing
statistical test for difference:
The p-value is the probability for the case that the observed
difference occured just by chance
Statistical Testing
statistical test for difference :
p is the probability for„no difference“
Statistical Testing
„Germany and Spain areequally strong soccer teams !“
Game tonight:
6 : 0 für Germany
Null hypothesistrialn=6
statisticaltest:
p = 0,031
p-value says:How big is the chance that one of two equally strong teams scores 6 goals, and the other one none.
Spain could still be equally strong as Germany, but the chance is small (3,1%)
Statistical Testing
small difference
small difference
large difference
large difference
large sample
large sample
small sample
small sample
p=0,68
p=0,05 p<0,001
p=0,05
Statistical Testing
The more cases are included, the better could „equality“ be disproved
Example: drug A has a success rate of 80%, while drug B is better with a healing rate of 90%
20 8/10 9/10 0,5340 16/20 18/20 0,38
100 40/50 45/50 0,16200 80/100 90/100 0,048400 160/200 180/200 0,005
1000 400/500 450/500 <0,001
drug A drug Bsample size 80% 90% p-value
Statistical Testing
A „significant“ p-value ...
does NOT prove the size of the difference,
but only excludes equality!
Statistical Testing
p-value large (>0.05)
The observed difference is probably caused by chance only, or the sample size in
not sufficient to exclude chance
null-hypothesis in maintained
“no difference”
p-value small (0.05)
chance alone is not sufficient to explain this difference
there is a systematicdifference
null-hypothesis is rejected
“significant difference“
p-value
Statistical Testing
The decision
- for a difference (significance, p 0.05)
- or against it („equality“, not significant, p > 0.05)
is not certain but only a probability (p-value). Therefore, errors are possible:
Type 1 error: Decision for a difference although there is none=> wrong finding
Type 2 error: Decision for „equality“ although there is one=> missed finding
Errors
Statistical Testing
Statistical Testing
Errors
Truth
Test says ... no difference difference
significant type 1 error
wrong finding
not significant missed finding
type 2 error
type 1 error type 2 error
“wrong finding“ „missed finding“
Fire detector wrong alarm no alarm in case of fire
Court conviction of set a an innocent criminal free
Clinical study difference difference was “significant” was missed
by chance
Statistical Testing
“What is the Power of the study ?”
Type 2 error probability to miss a difference
Power = 1 - probability to detect a difference
Power depends on:
- the magnitude of a difference- the sample size- the variation of the outcome measure- the significance level ()
Power
“What is the Power of the study ?”
POWER is the probability to detect a certain difference X with the given sample size n as significant (at level ).
“Does the study have enough power to detect a difference of size X ?”
Power
When to perform power calculations?
1. Planning phase – sample size calculation:
if the assumed difference really exists, what risk
would I take to miss this difference ?
2. Final analysis – in case of a non-significant result:
what size of difference could be rejected with the
present data ?
Power
Example
Clinical trial: Laparoscopic versus open appendectomy
Endpoint: Maximum post-operative pain intensity (VAS 0-100 points)
Patients: 30 cases per group
Results: lap.: 28 (SD 18)open: 32 (SD 17)
p = 0.38 not significant !
What is the power of the study ???
Power
Sample size
Certainty - errorPower
Differenceto be detected
Sample Size Calculation
Sample size
= 0.05 = 0.20
Differenceto be detected
error Risk to find a difference by chance
error Risk to miss a real difference
Sample Size Calculation
Sample size
= 0.05 = 0.20
PT & PCor
Difference& SD
Event rates: Percentages in the treatment and the control group
Continuous measures: difference of means and standard deviation
Sample Size Calculation
SD unknown
if the variation (standard deviation) is not known,the expected advantage could be expressed as
„effect size“
which is the difference in units of the (unknown) SD
Example:• pain values are at least 1 SD below the control group (effect size = 1.0)
• the difference will be at least half a SD (effect size = 0.5)
Continuous Endpoints
Sample Size Calculation
Test with non-parametric rank statistics
• non-normal distribution, or non-metric values
• Mann-Whitney U-test; Wilcoxon test
Use t-Test for sample size calculation
and add 10% of cases
Sample Size Calculation
Continuous Endpoints
Guess …
How many patients are needed to show that a new intervention is able to reduce the complication rate from 20% to 14% ?(=0.05; =0.20, i.e. 80% power)
Sample Size Calculation
Dupont WD, Plummer WD
Power and Sample Size
Calculations: A Review and
Computer Program
Contr. Clin. Trials (1990) 11:116-128
http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize
Sample Size Calculation
Sample Size Calculation
Multiple Testing
• Mehr als eine Versuchs-/Therapiegruppe
• Mehrere Zielgrößen
• Mehrere Follow-Up Zeitpunkte
• Zwischenauswertungen
• Subgruppen-Analysen
Multiple testing increases the risk of arbitrary significant results
Overall statistical error in 8 tests at the 0.05 level:
α = 1 - 0.95 8 = 1 - 0,66 = 0.34
correct at least 1 error
• 1 test (with 5% error) 95% 5%
• 2 tests (with 5% error each) 90,25% 9,75%
• 3 tests
• 4 tests
• 5 tests
• …..
90,25%
4,75%
4,75%
0,25%
Multiple Testing
correct at least 1 error
• 1 test (with 5% error) 95% 5%
• 2 tests (with 5% error each) 90,2% 9,8%
• 3 tests 85,7% 14,3%
• 4 tests 81,5% 18,5%
• 5 tests 77,4% 22,6%
• …..
Multiple Testing
Select ONE primary and multiple secondary questions
Combination of endpointsmultiple complications „Negative event“multiple time points AUC, maximum value, time to
normalmultiple endpoints sum score acc. to O‘Brian
Adjustment of p-values, i.e. each endpoint is tested with a „stronger“ α levele.g. Bonferroni: k tests at level α / k (5 tests at the 1% level, instead of 1 Test at 5% level)
A priori ordered hypothesespredefine the order of tests (each at 5% level)
What could you do?
Multiple Testing
• Fixed sample size end of trial
• Sequential design after each case
• Group sequential design after each step
• Adaptive design after each step
Interim Analysis
aus: TR Flemming, DP Harrington, PC O‘BrianDesign of group sequential tests. Contr. Clin Trials (1984) 5: 348-361
Interim Analysis