Sample Size Calculation PD Dr. Rolf Lefering IFOM - Institut für Forschung in der Operativen Medizin Universität Witten/Herdecke Campus Köln-Merheim.

Sample Size Calculation

PD Dr. Rolf Lefering

IFOM - Institut für Forschung in der Operativen MedizinUniversität Witten/HerdeckeCampus Köln-Merheim

sample size


uncertainty

costs & effort & time


Single study group

- continuous measurement

- count of events

Comparative trial (2 or more groups)

- continuous measurement

- count of events

Which true value is compatible with the observation?

Confidence interval

... range where the true value lies with a high probability (usually 95%)

Confidence Interval

Confidence Interval

Example:

56 patients with open fractures, 9 developed an infection (16%)

n=56

sample

all patients with open fractures

infection rate:

16%

true value ???

Formula for event rates

n = sample sizep = percentage

Example: n = 56p = 16%

CI95 = 16 +/- 1,96 * (16*84) / 56 = 16 +/- 9,6[ 6,4 - 25,6 ]

Confidence Interval

p * (100 - p) CI95 = P +/- 1,96 *

n

Confidence Interval

95% confidence interval around a 20% incidence rate

0

5

10

15

20

25

30

35

40

45

50

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

sample size

inci

den

ce r

ate

(%)

CI95 = M 1,96 * SE

Mean:

M = meanSE = standard errorSD = standard deviationn = sample size

Remember:

SE = SD / n

1,65 für 90%1,96 für 95%2,58 für 99%

Formula for continuous variables

Confidence Interval


Comparative trials

„What is the sample size to show that early weight-bearing therapy is better ?“

„Which key should I press here now ?“

„What is the sample size to show that early weight bearing therapy, as

compared to standard therapy, is able to reduce the time until return to work from 10 weeks to 8 weeks, where time

to work has a SD of 3 ?“

36 cases per group !

Outcome Measures

Wound infection

WellbeingPain

Sepsis

Beweglichkeit

Inedpemdence,autonomy

Hospital stayOrgan failure

Recurrencerate

Blood pressure

Fear

FatigueAnxietySocial

status

Lab values

Survival

Complications

Depressionen

• RelevanceDoes this endpoint convince the patient / the scientific community?

• Reliability; measurabilityCould the outcome easily be measured, without much variation, also by different people?

• Sensitivity Does the intervention lead to a significant change in the outcome measure?

• RobustnessHow much is the endpoint influenced by other factors?

Select Outcome Measure

• Primary endpoint

Main hypothesis or core question; aim of the studyStatistics: confirmative

• Secondary endpoints

Other interesting questions, additional endpoints Statistics: explorative

(could be confirmative in case of a large difference)

Advantage: prospective selection in the study protocol

• Retrospektively selected endpoints

Selected when the trial is done, based on subgroup differencesStatistics: ONLY explorative !

Select Outcome Measure

Sample size

Certainty - errorPower

Differenceto be detected


A statistical test

is a method (or tool) to decide whether an observed difference* is really present or just based on variation by chance

* this is true for a test for difference which is the most frequently applied one in medicine

Statistical Testing

Test for difference„Intervention A is better than B“

Test for equivalence„Intervention A and B have the same effect“

Test for non- inferiority„Intervention A is not worse than B“

Statistical Testing

How a test procedure works

1. Want to show: there is a difference

2. Assume: there is NO difference between the groups; („equal effects“, null-hypothesis)

3. Try to disprove this assumption:- perform study / experiment- measure the difference

4. Calculate: the probability that such a difference could occur although the assumption („no difference“) was true

= p-value

Statistical Testing

statistical test for difference:

The p-value is the probability for the case that the observed

difference occured just by chance

Statistical Testing

statistical test for difference :

p is the probability for„no difference“

Statistical Testing

„Germany and Spain areequally strong soccer teams !“

Game tonight:

6 : 0 für Germany

Null hypothesistrialn=6

statisticaltest:

p = 0,031

p-value says:How big is the chance that one of two equally strong teams scores 6 goals, and the other one none.

Spain could still be equally strong as Germany, but the chance is small (3,1%)

Statistical Testing

small difference

small difference

large difference

large difference

large sample

large sample

small sample

small sample

p=0,68

p=0,05 p<0,001

p=0,05

Statistical Testing

The more cases are included, the better could „equality“ be disproved

Example: drug A has a success rate of 80%, while drug B is better with a healing rate of 90%

20 8/10 9/10 0,5340 16/20 18/20 0,38

100 40/50 45/50 0,16200 80/100 90/100 0,048400 160/200 180/200 0,005

1000 400/500 450/500 <0,001

drug A drug Bsample size 80% 90% p-value

Statistical Testing

A „significant“ p-value ...

does NOT prove the size of the difference,

but only excludes equality!

Statistical Testing

p-value large (>0.05)

The observed difference is probably caused by chance only, or the sample size in

not sufficient to exclude chance

null-hypothesis in maintained

“no difference”

p-value small (0.05)

chance alone is not sufficient to explain this difference

there is a systematicdifference

null-hypothesis is rejected

“significant difference“

p-value

Statistical Testing

The decision

- for a difference (significance, p 0.05)

- or against it („equality“, not significant, p > 0.05)

is not certain but only a probability (p-value). Therefore, errors are possible:

Type 1 error: Decision for a difference although there is none=> wrong finding

Type 2 error: Decision for „equality“ although there is one=> missed finding

Errors

Statistical Testing

Statistical Testing

Errors

Truth

Test says ... no difference difference

significant type 1 error

wrong finding

not significant missed finding

type 2 error

type 1 error type 2 error

“wrong finding“ „missed finding“

Fire detector wrong alarm no alarm in case of fire

Court conviction of set a an innocent criminal free

Clinical study difference difference was “significant” was missed

by chance

Statistical Testing

“What is the Power of the study ?”

Type 2 error probability to miss a difference

Power = 1 - probability to detect a difference

Power depends on:

- the magnitude of a difference- the sample size- the variation of the outcome measure- the significance level ()

Power

“What is the Power of the study ?”

POWER is the probability to detect a certain difference X with the given sample size n as significant (at level ).

“Does the study have enough power to detect a difference of size X ?”

Power

When to perform power calculations?

1. Planning phase – sample size calculation:

if the assumed difference really exists, what risk

would I take to miss this difference ?

2. Final analysis – in case of a non-significant result:

what size of difference could be rejected with the

present data ?

Power

Example

Clinical trial: Laparoscopic versus open appendectomy

Endpoint: Maximum post-operative pain intensity (VAS 0-100 points)

Patients: 30 cases per group

Results: lap.: 28 (SD 18)open: 32 (SD 17)

p = 0.38 not significant !

What is the power of the study ???

Power

Sample size

Certainty - errorPower



Sample size

= 0.05 = 0.20


error Risk to find a difference by chance

error Risk to miss a real difference


Sample size

= 0.05 = 0.20

PT & PCor

Difference& SD

Event rates: Percentages in the treatment and the control group

Continuous measures: difference of means and standard deviation


SD unknown

if the variation (standard deviation) is not known,the expected advantage could be expressed as

„effect size“

which is the difference in units of the (unknown) SD

Example:• pain values are at least 1 SD below the control group (effect size = 1.0)

• the difference will be at least half a SD (effect size = 0.5)

Continuous Endpoints


Test with non-parametric rank statistics

• non-normal distribution, or non-metric values

• Mann-Whitney U-test; Wilcoxon test

Use t-Test for sample size calculation

and add 10% of cases


Continuous Endpoints

Guess …

How many patients are needed to show that a new intervention is able to reduce the complication rate from 20% to 14% ?(=0.05; =0.20, i.e. 80% power)


Dupont WD, Plummer WD

Power and Sample Size

Calculations: A Review and

Computer Program

Contr. Clin. Trials (1990) 11:116-128

http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize



Multiple Testing

• Mehr als eine Versuchs-/Therapiegruppe

• Mehrere Zielgrößen

• Mehrere Follow-Up Zeitpunkte

• Zwischenauswertungen

• Subgruppen-Analysen

Multiple testing increases the risk of arbitrary significant results

Overall statistical error in 8 tests at the 0.05 level:

α = 1 - 0.95 8 = 1 - 0,66 = 0.34

correct at least 1 error

• 1 test (with 5% error) 95% 5%

• 2 tests (with 5% error each) 90,25% 9,75%

• 3 tests

• 4 tests

• 5 tests

• …..

90,25%

4,75%

4,75%

0,25%

Multiple Testing

correct at least 1 error

• 1 test (with 5% error) 95% 5%

• 2 tests (with 5% error each) 90,2% 9,8%

• 3 tests 85,7% 14,3%

• 4 tests 81,5% 18,5%

• 5 tests 77,4% 22,6%

• …..

Multiple Testing

Select ONE primary and multiple secondary questions

Combination of endpointsmultiple complications „Negative event“multiple time points AUC, maximum value, time to

normalmultiple endpoints sum score acc. to O‘Brian

Adjustment of p-values, i.e. each endpoint is tested with a „stronger“ α levele.g. Bonferroni: k tests at level α / k (5 tests at the 1% level, instead of 1 Test at 5% level)

A priori ordered hypothesespredefine the order of tests (each at 5% level)

What could you do?

Multiple Testing

• Fixed sample size end of trial

• Sequential design after each case

• Group sequential design after each step

• Adaptive design after each step

Interim Analysis

aus: TR Flemming, DP Harrington, PC O‘BrianDesign of group sequential tests. Contr. Clin Trials (1984) 5: 348-361

Interim Analysis

Sample Size Calculation PD Dr. Rolf Lefering IFOM - Institut für Forschung in der Operativen Medizin Universität Witten/Herdecke Campus Köln-Merheim.

Documents

sample size p

difference pvalue small

difference intervention

sample size calculation

observed difference

p pvalue large

difference d

difference significance