Randomization and Causal Inference in Clinical Studies Martin Schumacher, Claudia Schmoor, Jan Beyersmann Institute for Medical Biometry and Statistics and Clinical Trials Unit Medical Center and Faculty of Medicine – University of Freiburg, Germany Institute for Statistics, University of Ulm, Germany IQWIG im Dialog 2019 Köln, 21.06.2019
55
Embed
Randomization and Causal Inference in Clinical Studies · 2019-07-17 · Randomization and Causal Inference in Clinical Studies Martin Schumacher, Claudia Schmoor, Jan Beyersmann
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Randomization and Causal Inference in
Clinical Studies
Martin Schumacher, Claudia Schmoor, Jan Beyersmann
Institute for Medical Biometry and Statistics and Clinical Trials Unit
Medical Center and Faculty of Medicine – University of Freiburg, Germany
Institute for Statistics, University of Ulm, Germany
IQWIG im Dialog 2019
Köln, 21.06.2019
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
Contents
• The potential outcome scenario: Randomization and
propensity score revisited
• A comprehensive cohort study
• Systematic reviews on comparisons of randomized
and observational studies
• The EMPA-REG OUTCOME trial: From standard
statistical analysis to mediation analyses
• Discussion and conclusions
2
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
Conflict of interest statement
The Institute for Medical Biometry and Statistics and the
Clinical Trials Unit of the Medical Center and Faculty of
Medicine – University of Freiburg, Germany obtained an
institutional research grant from Boehringer Ingelheim for
an independent analysis of the EMPA-REG OUTCOME
trial and subsequent specific analyses requested by the
Steering Committee of the study.
3
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
The potential outcome scenario (1)
• (Y(0), Y(1)) potential outcome vector for a patient
with Y(0) : outcome if the control (or no) treatment
is given
Y(1) : outcome if the new treatment is given
• Interest is in E (Y(1) - Y(0)),
called the average causal effect, or any suitable
functional
of the joint distribution F01 of (Y(0), Y(1))
• Usually (except for a perfect cross-over study), only
Y = X Y(1) + (1-X) Y(0)
is observed with X = 1 {new treatment is given}.
4
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
The potential outcome scenario (2)
• With randomized treatment allocation, we can identify
the marginal distributions F0 of Y(0) and F1 of Y(1) and
estimate them in an unbiased way.
• We can therefore identify and estimate the average
causal effect
E (Y(1) - Y(0)) = E (Y(1)) – E (Y(0))
or any suitable functional of the marginal distributions
F0 and F1 in a randomized clinical trial
• Randomization ensures balance of all known and
unknown potential confounders (except for random
imbalances)
5
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
The potential outcome scenario (3)
• The propensity score (PS) is defined as P (X=1|C)
where C is a vector of covariates
• We can identify the average causal effect under the
assumption
(Y(0), Y(1)) independent of X | C
(„No unmeasured confounders“)
• The assumption of „No unmeasured confounders“
implies
(Y(0), Y(1)) independent of X | P(X=1|C)
• Additional assumption: 0< P (X=1|C) < 1,
i.e. every patient can receive either treatment
6
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
Propensity score in practice
• There are various ways using the propensity score
(Matching, Weighting, Stratification, Covariate in
outcome regression model)
7
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
Propensity score in practice
• There are various ways using the propensity score
(Matching, Weighting, Stratification, Covariate in
outcome regression model)
• The propensity score has to be estimated: how to
model?
• Commonly used: logistic regression model
• Which covariates to include?
• Sparse or high-dimensional model?
• Penalized regression (e.g. lasso-type)?
• Penalized spline imputation method?
8
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
Contents
• The potential outcome scenario: Randomization and
propensity score revisited
• A comprehensive cohort study
• Systematic reviews on comparisons of randomized
and observational studies
• The EMPA-REG OUTCOME trial: From standard
statistical analysis to mediation analyses
• Discussion and conclusions
9
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
A comprehensive cohort study (1)
• Study conducted by the German Breast Cancer Study
Group to compare three cycles of chemotherapy (3
CMF) with six cycles of chemotherapy (6 CMF) in
patients with non-metastatic node-positive breast
cancer
• Randomized as well as patients not consenting to
randomization were enrolled and followed according to
a standard protocol
• Primary endpoint: event-free survival
10
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
11
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
12
Schmoor et al., Am J
Epidemiol 2008
Percent nonrandomized
patients receiving 3xCMF
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
13
Schmoor et al.,
Am J Epidemiol 2008
unadjusted
adjusted
Randomized
patients
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
14
unadjusted
Nonrandomized
patients
adjusted
Schmoor et al.,
Am J Epidemiol 2008
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
15
Schmoor et al.,
Am J Epidemiol 2008
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
16
Schmoor et al.,
Am J Epidemiol 2008
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
A comprehensive cohort study (2)
• In this particular study, propensity score as well as
regression adjustment led to results very similar to
those of the randomized part
• Comprehensive cohort studies have been carried out
very rarely. When only the results of an observational
study are available (analyzed based on a propensity
score), how reliable are the results?
• Systematic comparisons of treatment effects in
randomized vs. non-randomized studies?
• What are they about and what can we learn from
them?
17
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
Contents
• The potential outcome scenario: Randomization and
propensity score revisited
• A comprehensive cohort study
• Systematic reviews on comparisons of randomized
and observational studies
• The EMPA-REG OUTCOME trial: From standard
statistical analysis to mediation analyses
• Discussion and conclusions
18
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
Specific Comparisons
Reference Medical field Included study sample Number of studies Methodology used
in observational
studies
Direction of bias
Kuss et al. 2011
[4]
Cardiac surgery Randomized and non-
randomized studies comparing
off- and on pump surgery
28 non-randomized
studies and
51 randomized trials
Propensity score
based analyses
Similar effects
Lonjon et al. 2014
[5]
Surgical
procedures
Randomized and non-
randomized studies on surgical
procedures
70 non-randomized
studies and
94 randomized trials
Propensity score
based analyses
Similar effects
Zhang et al. 2014
[8]
Intensive Care
Medicine
Randomized and non-
randomized studies on
treatment of patients with sepsis
14 non-randomized
studies,
3 systematic reviews
and
7 randomized trials
Propensity score
based analyses
Overestimation of
effects
Ankarfeldt et al.
2017
[2]
Diabetes Randomized and non-
randomized studies on
treatment with glucose-lowering
drugs
2 comparisons with
11/16 randomized
studies and 7/4 non-
randomized studies,
published 2000-2015
Diverse No efficacy –
effectiveness gap
observed
19
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
General Comparisons
Reference Medical field Included study sample Number of studies Methodology used
• Starting with bivariable Cox regression models of the effect of
treatment and the potential mediators M on outcome Y, one at a
time separately for all potential mediators
• Multivariable Cox regression model with one representative of
the different mechanistic categories
• Variable being the most promising with regard to its potential as
mediator was chosen as representative
• Only variables chosen, which showed an effect on outcome Y
and which led to a reduced treatment effect estimate (hazard
ratio, HR, shifted to one) in the bivariable models
• For ranking of the strength of mediators:
Multivariable model building with step-up procedure including in
each step additionally the variable with the most mediating effect
43
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
44
Inzucchi et al. Diabetes Care 2018.
Unadjusted 0.615 0.491, 0.770 --
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
Contents
• The potential outcome scenario: Randomization and
propensity score revisited
• A comprehensive cohort study
• Systematic reviews on comparisons of randomized
and observational studies
• The EMPA-REG OUTCOME trial: From standard
statistical analysis to mediation analyses
• Discussion and conclusions
45
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
Bradford Hill„s criteria (1965)
46
1. Strength of Association
2. Consistency
3. Specificity
4. Temporality
5. Biological Gradient
(dose response)
6. Plausibility
7. Coherence
8. Experimental Evidence
9. Analogy
Hill AB, Proc Royal Soc Med 1965
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
Sir Austin Bradford Hill
47
• Presentation of „Principles of
Medical Statistics“ (The Lancet,
1937)
• Randomized trial on streptomycin
in patients with pulmonary
tuberculosis (BMJ, 1948)
• Demonstration of connection
between cigarette smoking and
lung cancer (with Richard Doll,
BMJ, 1954)
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
Discussion and Conclusions (1)
• Methods of causal inference, e.g. propensity score
analyses, rely on assumptions that cannot be verified
with the data usually available.
• Most critical is the assumption of
“no unmeasured confounders“
and the inclusion of confounders into a propensity
score model.
• This assumption is automatically fulfilled when
randomization is employed
(“Design trumps analysis“).
48
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
Discussion and Conclusions (2)
• Empirical comparisons of treatment effects in
randomized trials and observational studies do not
paint a clear picture. Some are themselves susceptible
to bias (“A bias in the evaluation bias…”, Franklin et al.
Epidemiol Methods 2017)
• Improvement of methodology for such comparisons is
urgently needed in order to not compare “apples and
oranges” but “apples and apples” (Lodi et al., Am J
Epidemiol 2019).
• Treatment effects based on observational studies are
often susceptible to other sources of bias, e.g. time-
related biases, besides confounding. Thus, all sources
of bias have to be considered! 49
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
Discussion and Conclusions (3)
• Methods for causal inference are best suited in
situations when randomization is not feasible in order
to obtain the best possible evidence.
• They are also useful in randomized trials in order to
address specific complications, e.g. non-compliance,
treatment cross-over etc.
• As shown for the EMPAREG-Outcome trial, they can
help answer additional questions on mechanisms of
treatment.
• Instead of a traditional mediation analysis, more
refined methods can be used (e.g. Aalen et al. Biom J
2019)
50
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
Take-Home Message
51
Randomize if you can,
Model if you must!
Modified according to J. Hanley, 2019
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
References – Propensity Scores
Graf E. The propensity score in the analysis of therapeutic studies. Biom J 1997, 39(3):297-307.
Karim S, Booth CM. Effectiveness in the absence of efficacy: Cautionary tales from real-world evidence. J Clin Oncol 2019;
37(13):1047-50.
Kuss O, Blettner M, Börgermann J. Propensity score: an alternative method of analyzing treatment effects. Dtsch Arztebl Int
2016, 113:597-603. doi 10.3238/arztebl.2016.0597.
Schmoor C, Olschewski M, Schumacher M. Randomized and non-randomized patients in clinical trials: experiences with
comprehensive cohort studies. Statistics in Medicine 1996, 15: 263-271.
Schmoor C, Caputo A, Schumacher M. Evidence from nonrandomized studies: a case study on the estimation of causal
effects. Am J Epidemiol 2008, 167(9):1120-9. doi 10.1093/aje/kwn010.
Schmoor C, Gall C, Stampf S, Graf E. Correction of confounding bias in non-randomized studies by appropriate weighting.
Biom J 2011, 53(2): 369-87. doi 10.1002/bmj.201000154.
Tian Y, Schuemie MJ, Suchard MA. Evaluating large-scale propensity score performance through real-world and synthetic
data experiments. Int J Epidemiol 2018, 2005-2014. doi 10.1093/ije/dyy120.
Williamson E, Morley R, Lucas A, Carpenter J. Propensity scores: From naïve enthusiasm to intuitive understanding. Stat
Methods Med Res2011, 21(3):273-293. doi 10.1177/0962280210394483.
Zhou T, Elliott MR, Little RJA. Penalized spline of propensity methods for treatment comparison. J Am Stat Assoc 2019,
114(525):1-38. doi 10.1080/01621459.2018.1518234.
52
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
References – Empirical Comparisons [1] Anglemyer A, Horvath HT, Bero L. Healthcare outcomes assessed with observational study designs compared with those
assessed in randomized trials. Cochrane Database of Systematic Reviews 2014, Issue 4.
[2] Ankarfeldt MZ, Adalsteinsson E, Groenwold RHH, Ali MS, Klungel OH. A systematic literature review on the efficacy-
effectiveness gap: comparison of randomized controlled trials and observational studies of glucose-lowering drugs. Clin
Epidemiol 2017; 9: 41-51.
[3] Kunz R, Oxman AD. The unpredictability paradox: review of empirical comparisons of randomised and non-randomised
clinical trials. BMJ 1998; 317:1185-90.
[4] Kuss O, Legler T, Börgemann J. Treatment effects from randomized trials and propensity score analyses were similar in
similar populations in an example from cardiac surgery. J Clin Epidemiol 2011; 64:1076-84.
[5] Lonjon G, Boutron I, Trinquart L, Ahmad N, Aim F, Nizard R, Ravaud P. Comparison of treatment effect estimates from
prospective nonrandomized studies with propensity score analysis and randomized controlled trials of surgical
procedures. Ann Surg 2014; 259: 18-25.
[6] Odgaard-Jensen J, Vist GE, Timmer A,Kunz R, Akl EA, Schünemann H, Briel M, Nordmann AJ, Pregno S, Oxman AD.
Randomisation to protect against selection bias in healthcare trials. Cochrane Database of Systematic Reviews 2011,
Issue 4.
[7] Soni PD, Hartman HE, Dess RT, Abugharib A, Allen SG, Feng FY, Zietman AL, Jagsi R, Schipper MJ, Spratt DE.
Comparison of population-based observational studies with randomized trials in oncology. J Clin Oncol 2019; 37:1209-
1216.
[8] Zhang Z, Ni H, Xu X. Do the observational studies using propensity score analysis agree with randomized controlled
trials in the area of sepsis? J Crit Care 2014; 29:886e9-886e15.
53
ms, cs, jb IQWIG im Dialog 2019 – Köln 21.06.2019
References – EMPAREG Outcome Study
Birkeland KI, Jørgensen ME, Carstensen B, Persson F, Gulseth HL, Thuresson M, Fenici P, Nathanson D, Nyström T,
Eriksson JW, Bodegård J, Norhammar A. Cardiovascular mortality and morbidity in patients with type 2 diabetes following
initiation of sodium-glucose co-transporter-2 inhibitors versus other glucoose-lowering drugs (CVD-REAL Nordic): a