Risks of predictive modelling in survival analysis
Thomas Alexander Gerds
Department of Biostatistics, University of Copenhagen
March 16, 2015
1 / 29
Outline
Risk prediction models
Time & Performance
How to summarize (and how not to)
2 / 29
Absolute risk of cardiovascular disease
Absolute risk of cardiovascular disease is the probability that anindividual with given risk factors and a given age will be diagnosedwith cardiovascular disease within a given time period.
I the longer the time-period 7→ the higher the risk
I the higher the risk of non-cardiovascular death in the sametime period 7→ the lower the risk.
The absolute risk has a direct interpretation for the single patient.
Hazard and odds do not have an intuitive interpretation forprediction.
3 / 29
The purpose of risk prediction
Developing statistical models that estimate the probability ofdeveloping cardiovascular disease and mortality will help
I clinicians identify individuals at higher risk
I allowing for earlier or more frequent screening
I counseling of behavioral changes to decrease risk
These types of models are also useful for the design of interventiontrials, for personalized medicine, and in the analysis of observationaldata (propensity score).
4 / 29
Regression vs prediction
A regression model describes how the distribution of outcomedepends on covariates.
I Regression parameters describe the magnitude of association.
I The parameters are estimated based on a data set.
A prediction model describes how the distribution of outcomedepends on covariates.
I Predictions are expected outcomes for given covariate values.
I The model is the result of a statistical modelling strategyapplied to a learning data set.
I The predictions are evaluated in an independent validationdata set.
5 / 29
Regression vs prediction
A regression model describes how the distribution of outcomedepends on covariates.
I Regression parameters describe the magnitude of association.
I The parameters are estimated based on a data set.
A prediction model describes how the distribution of outcomedepends on covariates.
I Predictions are expected outcomes for given covariate values.
I The model is the result of a statistical modelling strategyapplied to a learning data set.
I The predictions are evaluated in an independent validationdata set.
5 / 29
Learning
Regression Smoothing Tuning Boosting Penalization Model averaging
Learningdata set
Riskprediction
model
Validation
Data base
Risk prediction model
Time horizon t
Patientcharacteristics
Predictedrisk
of eventuntil time t
6 / 29
The making of a statistical risk prediction model
1. Specify model and modelling strategy including datadependent steps such as systematic screening of potential riskfactors X (dimension-reduction)
2. Apply modeling strategy to learning data set and obtain model:
π(t,Xi ) ≈ P(T ≤ t|Xi )
3. Validate model(ing strategy) internally via crossvalidation
4. Validate model externally using independent validation data
7 / 29
Risk dynamics
Dynamic = changing, able to change and to adapt
For the patient
I Environment
I Treatment
I Disease
For the modeller
I Prediction time-point
I Prediction horizon
I Event status, measurements of biomarkers, treatment,questionnaire results, etc.
8 / 29
The role of time
Prediction model timeline
Time point at whichpatient is provided
with prediction
Time pointattached to the prediction
baseline
followup
Origin (time 0) Horizon (time t)
Lost to followup, or (right) censored, means that patient was not followed until horizon time t.
Until time t, three things can happen:
I patient is event-freeI the event of interest has occurredI a competing event has occurred
9 / 29
Learning curve
Learning sample size
0 100 300 500 700 900
Overfittingmodel
Uselessmodel
Datagenerating
model
Marty McFly
True performance
●●
● ● ●●
● ●●
●
●Average:across
learning sets
Conditional:single
learning set
10 / 29
Model performance parameters
CalibrationThe model is calibrated if we can expect that x out of 100experience the event among all patients that receive a predictedrisk of x%.
DiscriminationA model has high discriminative power if the spread of predictedrisks is wide and patients at low risk receive low predicted risks(and patients at high risk receive high predicted risks).
Accuracy
A model has high accuracy if the distance between the predictedrisk and the outcome is small.
11 / 29
In summary, no single statistical measure assesses all the pertinentcharacteristics of a novel risk marker. We therefore recommendthat several measures be reported in papers evaluating novel riskmarkers, as summarized in Table 1.
Scienti�c statement American Heart Association
12 / 29
In summary, no single statistical measure assesses all the pertinentcharacteristics of a novel risk marker. We therefore recommendthat several measures be reported in papers evaluating novel riskmarkers, as summarized in Table 1.
Scienti�c statement American Heart Association
13 / 29
Recommendations
Table 1: Report the discrimination of the new marker1:
a. C-index and its con�dence limits for model with
established risk markers
b. C-index and its con�dence limits for model includ-
ing novel marker and established risk markers
c.Integrated discrimination index, discrimination
slope, or binary R2 for the model with and without
the novel risk marker
d. Graphic or tabular display of predicted risk in cases
and noncases
Which C-index? Censored data? Competing risks?
1Hlatky et al., Circulation, 2009, 2408-241614 / 29
Recommendations
Table 1: Report the discrimination of the new marker1:
a. C-index and its con�dence limits for model with
established risk markers
b. C-index and its con�dence limits for model includ-
ing novel marker and established risk markers
c.Integrated discrimination index, discrimination
slope, or binary R2 for the model with and without
the novel risk marker
d. Graphic or tabular display of predicted risk in cases
and noncases
Which C-index? Censored data? Competing risks?
1Hlatky et al., Circulation, 2009, 2408-241614 / 29
C-index for survival endpoint, no competing risks
General idea: use t-year predicted risk π(t,Xi ) as a continuousmarker.
Area under the ROC curve at time t:
AUC(t) = P(π(t,Xi ) > π(t,Xj)|Ti ≤ t,Tj > t)
Concordance index truncated at time t:
C(t) = P(π(t,Xi ) > π(t,Xj)|Ti < Tj ,Ti ≤ t)
15 / 29
Risk of using concordance index
Within the class of proportional hazard models (Cox regression)there is a one-to-one correspondence between event time andpredicted risk:
∀s,t {π(s|Xi ) < π(s|Xj)} = {π(t|Xi ) < π(t|Xj)}
Within the class of models that satisfy this property both AUC andC-index have an interpretation (not the same).
However, without this property of the prediction model only AUCworks and C-index is not proper.
16 / 29
Time
0.0 0.5 1.0
0 %
25 %
50 %
75 %
100 %
marker 1 2 3 4 5 6 7 8 910
Survivalprobability
17 / 29
Time
0.0 0.5 1.0
0 %
25 %
50 %
75 %
100 %
marker 1 2 3 4 5 6 7 8 910
Survivalprobability
C−index(0.5)=85% C−index(1)=60%
AUC(0.5)=86% AUC(1)=50%
18 / 29
Estimating C(t) with right-censored data
Model performance is a parameter of the joint distribution ofpredicted risk and outcome in validation data:
C(t) = P(π(t,Xi ) > π(t,Xj)|Ti < Tj ,Ti ≤ t)
This parameter can be estimated using the same technique asKaplan-Meier and Aalen-Johansen.
C(t) =
∑i ,j ∆i/Wi ,j(t)1 {Ti < Tj ,Ti ≤ t, rt(Xi ) < rt(Xj)}∑
i ,j ∆i/Wi ,j(t)1 {Ti < Tj ,Ti ≤ t)}
I IPCW: Wi ,j estimate of probability to observe (uncensored)
I Harrel's C (very popular): Wi ,j = 1
19 / 29
Risk of ignoring censored data
●
●●●
●●
●
●●
●●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●●
●
●●●●
●●●
●
●●
●
●
●●●
●
●
●●
●
●●●
●●
●●●
●
●
●
●●
●●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●●
●
●
●●
●●●
●
●
●
●●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●●
●●
●●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●●●
●●
●●●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●●
●●●
●●
●
●
●●
●
●
●
●
●
●●
●●
True
Sim
ple
Kap
lan−
Mei
er
Cox
Cox
om
it R
D
Cox
om
it P
SA
10
15
20
25
30
35
40
(A) Concordant pairs (%)t=36 months
●●
●●●
●●
●
●●
●
●
●●
●
●
●●
●
●●
●●●
●●●
●●●●●
●
●●●●●●
●
●
●
●●●
●
●
●
●●
●●●●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●●
●●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
True
Sim
ple
Kap
lan−
Mei
er
Cox
Cox
om
it R
D
Cox
om
it P
SA
20
25
30
35
40
45
50
(B) Pairs (%)t=36 months
●
●
●
●●●
●
●
●
●
●●
●
●●●
●
●
●●●
●●●●
●●●
●●●
●
●
●●
●●
●●
●
●●
●●●●
●●●●●●●
●●●
●●
●
●
●●
●
●●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
True
Sim
ple
Kap
lan−
Mei
er
Cox
Cox
om
it R
D
Cox
om
it P
SA
60
65
70
75
80
85
90
(C) Concordance index (%)t=36 months
Nominator Denominator Ratio (C-index)20 / 29
Dealing with competing risks
Competing risk model
Baseline(event−free)
Stroke
Death without stroke
000
111
222
AUC1(t) is the probability that arandom subject who experienceda stroke has received a higherpredicted risk of stroke atbaseline than another randomsubject who did not experience astroke within t years, i.e., is eitheralive and event free or diedwithout stroke before time t.
21 / 29
When you analyze risks (of stroke) in a hypothetical world in whichone cannot die, then age is a crazy good predictor.
Paul Blanche, 2015
22 / 29
The risk of ignoring competing risks
Time
Cum
ulat
ive
inci
denc
e
0.00 0.10 0.20 0.30
0 %
25 %
50 %
75 %
100
%
Age5060708090
Event of interest
Time
Cum
ulat
ive
inci
denc
e
0.00 0.10 0.20 0.30
0 %
25 %
50 %
75 %
100
%
Age5060708090
Death due to other causes
23 / 29
The risk of ignoring competing risks
Time
Cum
ulat
ive
inci
denc
e
0.00 0.10 0.20 0.30
0 %
25 %
50 %
75 %
100
%
Age5060708090
Event of interest
Time
Cum
ulat
ive
inci
denc
e
0.00 0.10 0.20 0.30
0 %
25 %
50 %
75 %
100
%
Age5060708090
Death due to other causes
Treat competing risks as censored: AUC(t)=83.8 % (Hypothetical world)
Account for competing risks: AUC_1(t)=71.5 % (Real world)
24 / 29
Recommendations (part II)
Table 1: Report the discrimination of the new marker1:
a.C-index and its con�dence limits for model with
established risk markers
b.C-index and its con�dence limits for model includ-
ing novel marker and established risk markers
c. Integrated discrimination index, discrimination
slope, or binary R2 for the model with and without
the novel risk marker
d.Graphic or tabular display of predicted risk in cases
and noncases
25 / 29
It is, in fact, a familiar observation that additional markers typicallyo�er (disappointingly) little added prognostic power.
Might perhaps other methods produce less pessimistic, stilldependable, results?
We examine two simple, frequently mentioned, ad hoc statistics, theIDI and NRI, but �nd that they can be subverted by misspeci�ed,poorly calibrated models or, for that matter, by outright cheating.
Jørgen Hilden
26 / 29
The risks of using NRI
Simulation study:
Data generating model : logit(P(Y = 1|X )) = α0 + α1X
I Simulations were repeated for varying values of exp(α1).
I In each run we generate a training sample (n=50) and anindependent validation sample (n=10,000)
Logistic regression models are �tted to each training sample:
Model X : logit(P(Y = 1|X )) = α0 + α1X
Model X + Z : logit(P(Y = 1|X ,Z )) = β0 + β1X + β2Z .
27 / 29
Simulation resultsSimulation scenario and training sample results (n=50)
exp(α1) Model X: exp(α1) Model X+Z: exp(β1) AUC (model X)
1 1 1 50.02 2.3 2.4 67.53 3.7 3.9 75.64 5.6 6.1 79.85 7.6 15.7 82.5
Performance Increment (%) in validation sample (n=10,000):
NRI ∆ AUC ∆ Brier
-0.1 0 -0.52.3 -1.7 -0.54.2 -1.3 -0.56.6 -1.0 -0.58.4 -0.9 -0.5
Averages of 1000 simulation runs
28 / 29
Conclusions & references
I For any given data set and research question, di�erentstatistical modelling strategies and prediction performancemeasures should lead to the same medical/clinical conclusionsand actions.
I Censored data and competing risks cannot be ignored
I There is a risk that guidelines lead astray
ReferencesBlanche et al:Estimating and comparing time-dependent areas under receiver operating characteristic curves forcensored event times with competing risks. Stat Med. 2013 Dec 30;32(30):5381-97
Hilden et alDo not rely on IDI and NRI, Statistics in Medicine, 33, 3405�3414Good looking statistics with nothing underneath, Epidemiology, 2014, 25, 265-267
R-packages pec, timeROC:prediction performance with survival outcome and competing risks
29 / 29