Risks of predictive modelling in survival analysisstaff.pubhealth.ku.dk/~tag/download/ShowTime-Gerds-IBS2015-Dortmund.pdfThese types of models are also useful for the design of intervention

Risks of predictive modelling in survival analysis

Thomas Alexander Gerds

Department of Biostatistics, University of Copenhagen

March 16, 2015

1 / 29

Outline

Risk prediction models

Time & Performance

How to summarize (and how not to)

2 / 29

Absolute risk of cardiovascular disease

Absolute risk of cardiovascular disease is the probability that anindividual with given risk factors and a given age will be diagnosedwith cardiovascular disease within a given time period.

I the longer the time-period 7→ the higher the risk

I the higher the risk of non-cardiovascular death in the sametime period 7→ the lower the risk.

The absolute risk has a direct interpretation for the single patient.

Hazard and odds do not have an intuitive interpretation forprediction.

3 / 29

The purpose of risk prediction

Developing statistical models that estimate the probability ofdeveloping cardiovascular disease and mortality will help

I clinicians identify individuals at higher risk

I allowing for earlier or more frequent screening

I counseling of behavioral changes to decrease risk

These types of models are also useful for the design of interventiontrials, for personalized medicine, and in the analysis of observationaldata (propensity score).

4 / 29

Regression vs prediction

A regression model describes how the distribution of outcomedepends on covariates.

I Regression parameters describe the magnitude of association.

I The parameters are estimated based on a data set.

A prediction model describes how the distribution of outcomedepends on covariates.

I Predictions are expected outcomes for given covariate values.

I The model is the result of a statistical modelling strategyapplied to a learning data set.

I The predictions are evaluated in an independent validationdata set.

5 / 29

Regression vs prediction

A regression model describes how the distribution of outcomedepends on covariates.

I Regression parameters describe the magnitude of association.

I The parameters are estimated based on a data set.

A prediction model describes how the distribution of outcomedepends on covariates.

I Predictions are expected outcomes for given covariate values.

I The model is the result of a statistical modelling strategyapplied to a learning data set.

I The predictions are evaluated in an independent validationdata set.

5 / 29

Learning

Regression Smoothing Tuning Boosting Penalization Model averaging

Learningdata set

Riskprediction

model

Validation

Data base

Risk prediction model

Time horizon t

Patientcharacteristics

Predictedrisk

of eventuntil time t

6 / 29

The making of a statistical risk prediction model

1. Specify model and modelling strategy including datadependent steps such as systematic screening of potential riskfactors X (dimension-reduction)

2. Apply modeling strategy to learning data set and obtain model:

π(t,Xi ) ≈ P(T ≤ t|Xi )

3. Validate model(ing strategy) internally via crossvalidation

4. Validate model externally using independent validation data

7 / 29

Risk dynamics

Dynamic = changing, able to change and to adapt

For the patient

I Environment

I Treatment

I Disease

For the modeller

I Prediction time-point

I Prediction horizon

I Event status, measurements of biomarkers, treatment,questionnaire results, etc.

8 / 29

The role of time

Prediction model timeline

Time point at whichpatient is provided

with prediction

Time pointattached to the prediction

baseline

followup

Origin (time 0) Horizon (time t)

Lost to followup, or (right) censored, means that patient was not followed until horizon time t.

Until time t, three things can happen:

I patient is event-freeI the event of interest has occurredI a competing event has occurred

9 / 29

Learning curve

Learning sample size

0 100 300 500 700 900

Overfittingmodel

Uselessmodel

Datagenerating

model

Marty McFly

True performance

●●

● ● ●●

● ●●

●

●Average:across

learning sets

Conditional:single

learning set

10 / 29

Model performance parameters

CalibrationThe model is calibrated if we can expect that x out of 100experience the event among all patients that receive a predictedrisk of x%.

DiscriminationA model has high discriminative power if the spread of predictedrisks is wide and patients at low risk receive low predicted risks(and patients at high risk receive high predicted risks).

Accuracy

A model has high accuracy if the distance between the predictedrisk and the outcome is small.

11 / 29

In summary, no single statistical measure assesses all the pertinentcharacteristics of a novel risk marker. We therefore recommendthat several measures be reported in papers evaluating novel riskmarkers, as summarized in Table 1.

Scienti�c statement American Heart Association

12 / 29

In summary, no single statistical measure assesses all the pertinentcharacteristics of a novel risk marker. We therefore recommendthat several measures be reported in papers evaluating novel riskmarkers, as summarized in Table 1.

Scienti�c statement American Heart Association

13 / 29

Recommendations

Table 1: Report the discrimination of the new marker1:

a. C-index and its con�dence limits for model with

established risk markers

b. C-index and its con�dence limits for model includ-

ing novel marker and established risk markers

c.Integrated discrimination index, discrimination

slope, or binary R2 for the model with and without

the novel risk marker

d. Graphic or tabular display of predicted risk in cases

and noncases

Which C-index? Censored data? Competing risks?

1Hlatky et al., Circulation, 2009, 2408-241614 / 29

Recommendations


a. C-index and its con�dence limits for model with


b. C-index and its con�dence limits for model includ-


c.Integrated discrimination index, discrimination



d. Graphic or tabular display of predicted risk in cases

and noncases

Which C-index? Censored data? Competing risks?

1Hlatky et al., Circulation, 2009, 2408-241614 / 29

C-index for survival endpoint, no competing risks

General idea: use t-year predicted risk π(t,Xi ) as a continuousmarker.

Area under the ROC curve at time t:

AUC(t) = P(π(t,Xi ) > π(t,Xj)|Ti ≤ t,Tj > t)

Concordance index truncated at time t:

C(t) = P(π(t,Xi ) > π(t,Xj)|Ti < Tj ,Ti ≤ t)

15 / 29

Risk of using concordance index

Within the class of proportional hazard models (Cox regression)there is a one-to-one correspondence between event time andpredicted risk:

∀s,t {π(s|Xi ) < π(s|Xj)} = {π(t|Xi ) < π(t|Xj)}

Within the class of models that satisfy this property both AUC andC-index have an interpretation (not the same).

However, without this property of the prediction model only AUCworks and C-index is not proper.

16 / 29

Time

0.0 0.5 1.0

0 %

25 %

50 %

75 %

100 %

marker 1 2 3 4 5 6 7 8 910

Survivalprobability

17 / 29

Time

0.0 0.5 1.0

0 %

25 %

50 %

75 %

100 %

marker 1 2 3 4 5 6 7 8 910

Survivalprobability

C−index(0.5)=85% C−index(1)=60%

AUC(0.5)=86% AUC(1)=50%

18 / 29

Estimating C(t) with right-censored data

Model performance is a parameter of the joint distribution ofpredicted risk and outcome in validation data:

C(t) = P(π(t,Xi ) > π(t,Xj)|Ti < Tj ,Ti ≤ t)

This parameter can be estimated using the same technique asKaplan-Meier and Aalen-Johansen.

C(t) =

∑i ,j ∆i/Wi ,j(t)1 {Ti < Tj ,Ti ≤ t, rt(Xi ) < rt(Xj)}∑

i ,j ∆i/Wi ,j(t)1 {Ti < Tj ,Ti ≤ t)}

I IPCW: Wi ,j estimate of probability to observe (uncensored)

I Harrel's C (very popular): Wi ,j = 1

19 / 29

Risk of ignoring censored data

●

●●●

●●

●

●●

●●

●

●

●

●●

●

●●

●

●●

●

●

●

●●

●●

●

●●●●

●●●

●

●●

●

●

●●●

●

●

●●

●

●●●

●●

●●●

●

●

●

●●

●●●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●●

●

●

●

●

●

●●●●

●

●

●●

●

●

●

●

●

●●●

●●

●

●

●

●

●

●●

●●

●

●

●

●●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●●●

●

●

●●

●●●

●

●

●

●●

●

●●

●

●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

●

●●

●●

●●

●

●

●●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●●●●

●●

●●●

●

●

●

●●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

●

●●

●●●

●●

●

●

●●

●

●

●

●

●

●●

●●

True

Sim

ple

Kap

lan−

Mei

er

Cox

Cox

om

it R

D

Cox

om

it P

SA

10

15

20

25

30

35

40

(A) Concordant pairs (%)t=36 months

●●

●●●

●●

●

●●

●

●

●●

●

●

●●

●

●●

●●●

●●●

●●●●●

●

●●●●●●

●

●

●

●●●

●

●

●

●●

●●●●

●

●

●

●

●●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●●●

●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●●●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●●●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●●

●●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●●

●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●●

●●

●●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

True

Sim

ple

Kap

lan−

Mei

er

Cox

Cox

om

it R

D

Cox

om

it P

SA

20

25

30

35

40

45

50

(B) Pairs (%)t=36 months

●

●

●

●●●

●

●

●

●

●●

●

●●●

●

●

●●●

●●●●

●●●

●●●

●

●

●●

●●

●●

●

●●

●●●●

●●●●●●●

●●●

●●

●

●

●●

●

●●●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●●

●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●

●

●

●●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●●●●

●

●

●

●

●

●

●

●

●

●

●

●●●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●●

●

●

●

●

●●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●●●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

True

Sim

ple

Kap

lan−

Mei

er

Cox

Cox

om

it R

D

Cox

om

it P

SA

60

65

70

75

80

85

90

(C) Concordance index (%)t=36 months

Nominator Denominator Ratio (C-index)20 / 29

Dealing with competing risks

Competing risk model

Baseline(event−free)

Stroke

Death without stroke

000

111

222

AUC1(t) is the probability that arandom subject who experienceda stroke has received a higherpredicted risk of stroke atbaseline than another randomsubject who did not experience astroke within t years, i.e., is eitheralive and event free or diedwithout stroke before time t.

21 / 29

When you analyze risks (of stroke) in a hypothetical world in whichone cannot die, then age is a crazy good predictor.

Paul Blanche, 2015

22 / 29

The risk of ignoring competing risks

Time

Cum

ulat

ive

inci

denc

e

0.00 0.10 0.20 0.30

0 %

25 %

50 %

75 %

100

%

Age5060708090

Event of interest

Time

Cum

ulat

ive

inci

denc

e

0.00 0.10 0.20 0.30

0 %

25 %

50 %

75 %

100

%

Age5060708090

Death due to other causes

23 / 29

The risk of ignoring competing risks

Time

Cum

ulat

ive

inci

denc

e

0.00 0.10 0.20 0.30

0 %

25 %

50 %

75 %

100

%

Age5060708090

Event of interest

Time

Cum

ulat

ive

inci

denc

e

0.00 0.10 0.20 0.30

0 %

25 %

50 %

75 %

100

%

Age5060708090

Death due to other causes

Treat competing risks as censored: AUC(t)=83.8 % (Hypothetical world)

Account for competing risks: AUC_1(t)=71.5 % (Real world)

24 / 29

Recommendations (part II)


a.C-index and its con�dence limits for model with


b.C-index and its con�dence limits for model includ-


c. Integrated discrimination index, discrimination



d.Graphic or tabular display of predicted risk in cases

and noncases

25 / 29

It is, in fact, a familiar observation that additional markers typicallyo�er (disappointingly) little added prognostic power.

Might perhaps other methods produce less pessimistic, stilldependable, results?

We examine two simple, frequently mentioned, ad hoc statistics, theIDI and NRI, but �nd that they can be subverted by misspeci�ed,poorly calibrated models or, for that matter, by outright cheating.

Jørgen Hilden

26 / 29

The risks of using NRI

Simulation study:

Data generating model : logit(P(Y = 1|X )) = α0 + α1X

I Simulations were repeated for varying values of exp(α1).

I In each run we generate a training sample (n=50) and anindependent validation sample (n=10,000)

Logistic regression models are �tted to each training sample:

Model X : logit(P(Y = 1|X )) = α0 + α1X

Model X + Z : logit(P(Y = 1|X ,Z )) = β0 + β1X + β2Z .

27 / 29

Simulation resultsSimulation scenario and training sample results (n=50)

exp(α1) Model X: exp(α1) Model X+Z: exp(β1) AUC (model X)

1 1 1 50.02 2.3 2.4 67.53 3.7 3.9 75.64 5.6 6.1 79.85 7.6 15.7 82.5

Performance Increment (%) in validation sample (n=10,000):

NRI ∆ AUC ∆ Brier

-0.1 0 -0.52.3 -1.7 -0.54.2 -1.3 -0.56.6 -1.0 -0.58.4 -0.9 -0.5

Averages of 1000 simulation runs

28 / 29

Conclusions & references

I For any given data set and research question, di�erentstatistical modelling strategies and prediction performancemeasures should lead to the same medical/clinical conclusionsand actions.

I Censored data and competing risks cannot be ignored

I There is a risk that guidelines lead astray

ReferencesBlanche et al:Estimating and comparing time-dependent areas under receiver operating characteristic curves forcensored event times with competing risks. Stat Med. 2013 Dec 30;32(30):5381-97

Hilden et alDo not rely on IDI and NRI, Statistics in Medicine, 33, 3405�3414Good looking statistics with nothing underneath, Epidemiology, 2014, 25, 265-267

R-packages pec, timeROC:prediction performance with survival outcome and competing risks

29 / 29

Risks of predictive modelling in survival analysisstaff.pubhealth.ku.dk/~tag/download/ShowTime-Gerds-IBS2015-Dortmund.pdfThese types of models are also useful for the design of intervention

Documents