Top Banner
Cox regression ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox’s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53
53

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Oct 30, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

ADVANCED STATISTICAL ANALYSIS OFEPIDEMIOLOGICAL STUDIES

Cox’s regression analysisTime dependent explanatory variables

Henrik RavnBandim Health Project, Statens Serum Institut

4 November 2011

1 / 53

Page 2: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

OutlineSurvival DataExample: Malignant Melanoma DataThe Cox ModelCox in SASChoice of Time-ScaleExample: Guinea-Bissau DataDelayed entriesTime dependent explanatory variables

2 / 53

Page 3: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

d∏i=1

exp(βXi)∑j∈R(ti) exp(βXj)

3 / 53

Page 4: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Survival Data

Time to death or other event of interest.One time-scale including a well-defined starting time – time-origin:

Time from start of randomized clinical trial to death.

Time from first employment to pension.

Time from filling of a tooth to filling falls out.

What is special about survival data?

Right-skewed. No problem.

CENSORING: For some we will only know a lower bound oflifetime.

4 / 53

Page 5: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Simple data

Times (months)

Indi

vidu

al

0 5 10 15 20 25

1

2

3

4

5

6

7

8

9

10

11

12

5 / 53

Page 6: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Survival and hazard function

Let T be the TIME to event of interest:

S(t) = P(T > t)

= probability of survival to time t after entry at time 0

λ(t) = incidence, rate, or hazard

Relationship:

S(t) = exp

(−∫ t

0λ(s)ds

)= exp(−Λ(t))

Λ(t) is called the integrated hazard function.

6 / 53

Page 7: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

0 50 100 150

0.00

10.

003

0.00

5

λ(t) = λ

Time (t)

Haz

ard

rate

0 50 100 150

0.2

0.4

0.6

0.8

1.0

S(t) = e−λt

Time (t)

Sur

viva

l Fun

ctio

n

0 50 100 150

0.00

0.05

0.10

0.15

Λ(t) = λt

Time (t)

Inte

grat

ed h

azar

d

7 / 53

Page 8: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Kaplan-Meier – estimate of survival function

Death times t1, ..., td (ordered). Y (ti ) = # alive just before ti .

S(t) =∏ti≤t

(1− 1

Y (ti )

)

Risk sets

Times (months)

Indi

vidu

al

0 5 10 15 20 25

1

2

3

4

5

6

7

8

9

10

11

12

8 / 53

Page 9: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

1

1 1

1 1

0.0

00.2

50.5

00.7

51.0

0S

urv

ival pro

babili

ty

12 10 9 6 5 4 1 Number at risk

0 5 10 15 20 25Time (months)

Kaplan−Meier survival estimate

9 / 53

Page 10: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Malignant Melanoma Data

In the period 1962-77 a total of 205 patients had their tumorremoved and were followed until 1977. At the end of 1977:

57 died of mgl. mel. (status=1)

134 were still alive. (status=2)

14 died of non-related mgl. mel. (status=3) – competing risk

Purpose: Study effect on survival of sex, age, thickness of tumor,ulceration, etc.

-

1962 1977

• • • •• • • • •

10 / 53

Page 11: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Malignant melanoma

N time status sex age year thickness ulcer

1 10 3 1 76 1972 6.76 1

2 30 3 1 56 1968 0.65 0

3 35 2 1 41 1977 1.34 0

4 99 3 0 71 1968 2.90 0

5 185 1 1 52 1965 12.08 1

6 204 1 1 28 1971 4.84 1

7 210 1 1 77 1972 5.16 1

8 232 3 0 60 1974 3.22 1

9 232 1 1 49 1968 12.88 1

10 279 1 0 68 1971 7.41 1

. . . . . . . .

. . . . . . . .

203 4688 2 0 42 1965 0.48 0

204 4926 2 0 50 1964 2.26 0

205 5565 2 0 41 1962 2.90 0

11 / 53

Page 12: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

The Cox Model

The Cox model assumes that the rate for the ith individual is

λi (t) = λ0(t) exp(β1Xi1 + β2Xi2 + . . .+ βpXip)

where β1, β2, . . . , βp are regression parameters, Xi1 is the covariatevalue for covariate 1 for individual i , etc. Finally, λ0(t) is thebaseline hazard.Time t is the time-scale of choice, e.g. age, time sincerandomization, or time since operation. As formulated here theonly quantity on the right-hand side of the equal sign that dependson time is the baseline hazard λ0(t). If all covariates (X ’s) are zerowe get

λi (t) = λ0(t).

The interpretation of the baseline hazard is thus the hazard of aindividual that have all covariates equal to zero.

12 / 53

Page 13: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

The Cox model

λi (t) = λ0(t) exp(β1Xi1 + β2Xi2 + . . .+ βpXip)

can also be written on the log-scale (natural log)

log(λi (t)) = log(λ0(t) exp(β1Xi1 + β2Xi2 + . . .+ βpXip))

= log(λ0(t)) + β1Xi1 + β2Xi2 + . . .+ βpXip.

The Cox model assumes that

the effects of covariates are additive and linear on the log ratescale, just like the poisson regression.

the CORNER i.e. the baseline hazard is non-parametric anddepends on time, and time is thus adjusted for.

We now turn to the interpretation of the regression parametersβ1, β2, . . . , βp.

13 / 53

Page 14: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

One binary covariate

To make things more simple we only study the effect of one singlebinary covariate, e.g. sex on the risk of dying

Xi =

{0 if individual i is a female

1 if individual i is a male

The Cox model is

λi (t) = λ0(t) exp(βXi ).

With Xi defined as above we get

λi (t) =

{λ0(t) if individual i is a female

λ0(t) exp(β) if individual i is a male

14 / 53

Page 15: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Mortality Rate Ratio – Hazard Ratio

If

λi (t) =

{λ0(t) if individual i is a female

λ0(t) exp(β) if individual i is a male

then we have that the RATE RATIO (RR) between males andfemales is

RR =λ0(t) exp(β)

λ0(t)= exp(β).

Importantly, the ratio is independent of time, i.e. we havePROPORTIONAL HAZARDS over time.

The Cox model is also called the proportional hazards model.

How to estimate β? And what about baseline hazard λ0(t)?

15 / 53

Page 16: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Likelihood Function

The baseline hazard is regarded as a nuisance and is not in generalestimated, but it is possible.

Let t1, . . . , td be the ordered death times

It can been shown, that all we need is to find the β that maximizesthe following function called Cox’s partial likelihood function

L(β) =d∏

i=1

exp(βXi )∑j∈R(ti )

exp(βXj)

where R(ti ) is the RISK SET at death time ti i.e. the set ofindividuals being at risk of dying (under observation) just beforetime ti .

The resulting estimate β is called the MAXIMUM LIKELIHOODESTIMATE of β.

16 / 53

Page 17: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Likelihood Function – a closer look

Death times t1, . . . , td , numbering individuals with deaths first:

i = 1, 2, . . . , d , d + 1, . . . , n.

with timest1, t2, . . . , td , td+1, . . . , tn.

and covariates

X1,X2, . . . ,Xd ,Xd+1, . . . ,Xn.

At each death time we have the RISK SET: individuals alive and atrisk of dying just before the death time:

R(t1),R(t2), . . . ,R(td)

17 / 53

Page 18: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Risk sets

Times (months)

Indi

vidu

al

0 5 10 15 20 25

1

2

3

4

5

6

7

8

9

10

11

12

18 / 53

Page 19: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

For the Cox model

λi (t) = λ0(t) exp(βXi )

we use the Cox likelihood function to estimate β:

L(β) =d∏

i=1

exp(βXi )∑j∈R(ti )

exp(βXj)

=exp(βX1)∑

j∈R(t1) exp(βXj)· exp(βX2)∑

j∈R(t2) exp(βXj)· · · exp(βXd)∑

j∈R(td ) exp(βXj)

We index individuals in the risk sets using the letter j . Writing∑j∈R(t1) exp(βXj) means summing over the individuals in the risk

set for death time t1. If we here assume that no one was censoredbefore the first death time all individuals are in the risk set R(t1)and the sum is

exp(βX1) + exp(βX2) + · · ·+ exp(βXn).

19 / 53

Page 20: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

For example for the Cox model

λi (t) = λ0(t) exp(β · sex)

Sex: 1=male , 0=female. Likelihood function:

exp(β)∑j∈R(t1) exp(βXj)

· 1∑j∈R(t2) exp(βXj)

· · · exp(β)∑j∈R(td ) exp(βXj)

.

If we again assume that no one was censored before the first deathtime all individuals are in the risk set R(t1) and the sum is

exp(β) + 1 + · · ·+ exp(β) = NM · exp(β) + NF ,

where NM and NF number of males and females respectively inR(t1).

The risk sets also play a crucial role in nested case-control studies– more on this later in the course.

20 / 53

Page 21: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

So far the following assumptions have been made for the Coxmodel

The baseline hazard is assumed non-parametric, i.e. assumedto vary freely.

The effects of covariates are additive and linear on the lograte scale.

The ratio of the hazard rate for two subjects are constant overtime. In other words, there is no interaction between thecovariates and the time variable.

Let us look at the Melanoma data using SAS.

21 / 53

Page 22: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

0.0

00.2

50.5

00.7

51.0

0

0 5 10 15Time (years)

female male

Kaplan−Meier survival estimates, by sex

What is the estimate of the RR between males and females?

22 / 53

Page 23: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Cox in SAS 9.1.3

In SAS, proc phreg and proc tphreg can be used for estimatingin the Cox model. We will use proc tphreg as this procedure canhandle categorical variables much easier than proc phreg. Usingproc tphreg we define the variable sex to be categorical usingthe class statement. For the variable sex 1 is males and 0 isfemales.

proc tphreg data=melanom;

class sex;

model time*status(2,3) = sex;

run;

Please note, that we have two censoring codes namely 2 and 3.

NB: In SAS 9.2 proc phreg now handles class variables and proc

tphreg is obsolete.

23 / 53

Page 24: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Part of output from proc tphreg:

Analysis of Maximum Likelihood Estimates

Parameter Standard Hazard

Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio

sex 0 1 -0.66214 0.26513 6.2370 0.0125 0.516

The column Parameter Estimate is β. For a class variable SASwill automatically choose the highest number (here 1) as thereference. Thus, the rate ratio or Hazard Ratio is femalescompared to males.

There is no estimate statement in proc (t)phreg, but a similarso-called contrast statement exists. Instead we can use the ref

option in the class statement. Note also the option risklimits

in the model statement which calculates the confidence interval forthe hazard ratio.

24 / 53

Page 25: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

proc tphreg data=melanom;

class sex(ref="0");

model time*status(2,3) = sex / risklimits;

run;

.

.

.

Analysis of Maximum Likelihood Estimates

Parameter Standard Hazard 95% Hazard Ratio

Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits

sex 1 1 0.66214 0.26513 6.2370 0.0125 1.939 1.153 3.260

25 / 53

Page 26: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Melanoma data, thickness of tumor given by variable gtyk

gtyk =

1 if <2mm

2 if 2-5 mm

3 if >5 mm

proc tphreg data=melanom;

class gtyk;

model time*status(2,3) = gtyk / risklimits;

run;

Type 3 Tests

Wald

Effect DF Chi-Square Pr > ChiSq

gtyk 2 25.6749 <.0001

Analysis of Maximum Likelihood Estimates

Parameter Standard Hazard 95% Hazard Ratio

Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits

gtyk 1 1 -1.67324 0.38572 18.8176 <.0001 0.188 0.088 0.400

gtyk 2 1 -0.11055 0.32391 0.1165 0.7329 0.895 0.475 1.689

26 / 53

Page 27: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Melanoma data, + age in years

proc tphreg data=melanom;

class gtyk sex;

model time*status(2,3) = gtyk sex age / risklimits;

run;

Type 3 Tests

Wald

Effect DF Chi-Square Pr > ChiSq

sex 1 2.3660 0.1240

gtyk 2 21.3752 <.0001

age 1 1.5241 0.2170

Analysis of Maximum Likelihood Estimates

Parameter Standard Hazard 95% Hazard Ratio

Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits

sex 0 1 -0.41608 0.27050 2.3660 0.1240 0.660 0.388 1.121

gtyk 1 1 -1.53827 0.39232 15.3738 <.0001 0.215 0.100 0.463

gtyk 2 1 -0.08180 0.32849 0.0620 0.8033 0.921 0.484 1.754

age 1 0.01052 0.00852 1.5241 0.2170 1.011 0.994 1.028

27 / 53

Page 28: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Likelihood Ratio Test.

proc tphreg data=melanom;

class gtyk sex;

model time*status(2,3) = gtyk sex;

run;

Model Fit Statistics

Without With

Criterion Covariates Covariates

-2 LOG L 566.398 532.244

AIC 566.398 538.244

SBC 566.398 544.373

-------------------------------------------

proc tphreg data=melanom;

class sex;

model time*status(2,3) = sex;

run;

Model Fit Statistics

Without With

Criterion Covariates Covariates

-2 LOG L 566.398 560.248

AIC 566.398 562.248

SBC 566.398 564.291

LR = 560.248 - 532.244 = 28.0 ∼ χ22 (2 degrees of freedom)

28 / 53

Page 29: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

SAS: p-value from chi-square test

data temp;

chisquare=28;

df=2;

p=1-probchi(chisquare,df);

run;

proc print data=temp;

run;

Obs chisquare df p

1 28 2 .000000832

29 / 53

Page 30: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Choice of Time-Scale

A study may be conducted over calendar time even though thenatural time-scale is time since treatment – Melanoma study.

Cohort studies are often conducted by recruiting a random sampleof the population at the start of the study and then these subjectsare followed for a number of years – Framingham.

A natural time-scale may be age rather than time in study whichmost often is an artificial time-scale constructed by theinvestigators.

What would time-origin be if age was chosen as time-scale?

30 / 53

Page 31: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Vaccinations in Guinea-Bissau 1990-96

Rural Guinea-Bissau: 5274 children under 7 months of age visitedtwo times at home, with an interval of six months. Informationabout vaccination (BCG, DTP, mealses vaccine) collected at eachvisit and at second visit death during follow-up is registered. Somechildren moved away during follow-up, i.e. censored or surviveduntil next visit, also censored. Below are some of the variablenames from the bissau data.

fuptime Follow-up time in daysdead 0 = censored, 1 = deadbcg 1 = Yes, 2 = Noagem Age at first visit in months

31 / 53

Page 32: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Is the risk of dying associated with vaccination?

OutcomeExposure Died Survived Total

BCG vaccinated 125 (3.8%) 3176 3301not BCG vaccinated 97 (4.9%) 1876 1973

Total 222 (4.2%) 5052 5274

32 / 53

Page 33: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

proc tphreg data=bissau;

class bcg;

model fuptime*dead(0)=bcg / rl ;

run;

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 4.2824 1 0.0385

Score 4.3761 1 0.0364

Wald 4.3474 1 0.0371

Type 3 Tests

Wald

Effect DF Chi-Square Pr > ChiSq

bcg 1 4.3474 0.0371

Analysis of Maximum Likelihood Estimates

Parameter Standard Hazard 95% Hazard Ratio

Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits

bcg 1 1 -0.28214 0.13532 4.3474 0.0371 0.754 0.578 0.983

33 / 53

Page 34: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

proc tphreg data=bissau;

class bcg agem;

model fuptime*dead(0)=bcg agem / rl ;

run;

Type 3 Tests

Wald

Effect DF Chi-Square Pr > ChiSq

bcg 1 5.6510 0.0174

agem 6 7.7246 0.2590

Analysis of Maximum Likelihood Estimates

Parameter Standard Hazard 95% Hazard Ratio

Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits

bcg 1 1 -0.34720 0.14605 5.6510 0.0174 0.707 0.531 0.941

agem 0 1 0.01053 0.35339 0.0009 0.9762 1.011 0.506 2.020

agem 1 1 0.12553 0.34494 0.1324 0.7159 1.134 0.577 2.229

agem 2 1 -0.24631 0.35903 0.4707 0.4927 0.782 0.387 1.580

agem 3 1 0.20946 0.34502 0.3686 0.5438 1.233 0.627 2.425

agem 4 1 0.34300 0.34265 1.0020 0.3168 1.409 0.720 2.758

agem 5 1 0.34118 0.34699 0.9668 0.3255 1.407 0.713 2.777

34 / 53

Page 35: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Delayed entries

Time in study

Times (months)

Indi

vidu

al

0 5 10 15 20 25

1

2

3

4

5

6

7

8

9

10

11

12

Age as time

Age (months)

Indi

vidu

al

0 5 10 15 20 25

1

2

3

4

5

6

7

8

9

10

11

12

35 / 53

Page 36: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Subjects are only at risk at age of entry and onwards. They are notat risk in our ”World of analysis” before age of entry!

Handling of delayed entries is ”easily” done by careful control ofthe RISK SET R(ti ) at death time ti in the likelihood function:

L(β) =d∏

i=1

exp(βXi )∑j∈R(ti )

exp(βXj)

Only individuals at risk and under observation is included in therisk set R(ti ) at time ti .

36 / 53

Page 37: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Delayed entries in SAS

data bissau2;

set bissau;

outage=age+fuptime;

run;

proc tphreg data=bissau2;

class bcg;

model (age,outage)*dead(0)= bcg / rl;

run;

Analysis of Maximum Likelihood Estimates

Parameter Standard Hazard 95% Hazard Ratio

Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits

bcg 1 1 -0.35542 0.14065 6.3854 0.0115 0.701 0.532 0.923

37 / 53

Page 38: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Time dependent explanatory variables

The Cox model can be expanded to include time-varying covariates

λi (t) = λ0(t) exp(βXi (t)).

The likelihood function for death times t1, . . . , td becomes

L(β) =d∏

i=1

exp(βXi (ti ))∑j∈R(ti )

exp(βXj(ti )).

From this we can see that we ”just” need to know the value of thecovariates at the deaths times:

Xi (t1),Xi (t2), . . . ,Xi (td).

The covariate values at any time different from a death time is notused in the likelihood function.

38 / 53

Page 39: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

The most simple time-varying covariate is a binary variable that isallowed to change once during follow-up, e.g. new BCGvaccinations registered between visits in the Bissau data:

Xi (t) =

{0 if no BCG before time t

1 if BCG-time ≤ t

39 / 53

Page 40: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

A child being BCG-vaccinated after 3 months of follow-up.

Follow−up (months)

BC

G

0 1 2 3 4 5 6

01

The time-varying covariate is 0 in the time interval 0 to 3 monthsand 1 for the rest of follow-up. For a child who was BCGvaccinated before first visit the time-varying covariate is one duringall the follow-up.

40 / 53

Page 41: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Multi-state Model

Unexposed Exposed

Dead

-

SSSSSSw

������/

λ01(t)

λ02(t) λ12(t)

0 1

2

We want to compare λ02(t) and λ12(t). The transition λ01(t) isnot modeled here.

41 / 53

Page 42: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Instead of time of follow-up we will use age as time-scale toillustrate the use of BCG as a time-varying covariate in the Bissaudata. At visit 2 the vaccination cards were seen for the children athome and an age of BCG vaccination (bcgage) was calculated:

id fuptime dead age bcg bcgage outage

...

486 159 0 199 1 107 358

487 183 0 97 1 20 280

488 183 0 43 2 174 226

489 137 1 140 1 40 277

490 183 0 165 1 46 348

...

499 157 0 186 1 64 343

500 25 1 191 2 . 216

501 157 0 183 1 61 340

...

42 / 53

Page 43: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Binary time-varying covariate in SAS (I)

proc tphreg data=bcg;

if .<bcgage<outage then bcg_t=1;

else bcg_t=0;

model (age,outage)*dead(0)=bcg_t / rl ;

run;

Analysis of Maximum Likelihood Estimates

Parameter Standard Hazard 95% Hazard Ratio

Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits

bcg_t 1 -1.08278 0.14046 59.4286 <.0001 0.339 0.257 0.446

43 / 53

Page 44: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

The if-statement

if .<bcgage<outage then bcg_t=1;

else bcg_t=0;

is recalculated at each death time. The outage in the model

statement refers to the current death times being evaluated (i.e. ati in the likelihood). For the first death time which is t1 = 23 daysof age, the if-statement becomes

if .<bcgage<23 then bcg_t=1;

else bcg_t=0;

being calculated for all children at risk at age 23 days (inR(t1 = 23)) with their individual bcgage-values.

This is a recalculation of the time-varying covariate at each deathtime c.f. the likelihood function.

44 / 53

Page 45: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Binary time-varying covariate in SAS (II)

Splitting up persons with a changing time-varying covariate in tworecords:

age bcgage outage

bcgvacc=0

bcgvacc=1

status=0

status=dead

and use delayed entries.

Thus, we need to generate a new data set.

45 / 53

Page 46: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

data splitbcg;

set bcg;

if bcgage=. or bcgage>outage then do;

bcgvacc=0; entryage=age; exitage=outage; status=dead; output;

end;

if .<bcgage<=age then do;

bcgvacc=1; entryage=age; exitage=outage; status=dead; output;

end;

if age<bcgage<=outage then do;

bcgvacc=0; entryage=age ; exitage=bcgage; status= 0; output;

bcgvacc=1; entryage=bcgage; exitage=outage; status=dead; output;

end;

run;

id fuptime dead age bcg bcgage outage bcgvacc entryage exitage status

488 183 0 43 2 174 226 0 43 174 0

488 183 0 43 2 174 226 1 174 226 0

46 / 53

Page 47: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

proc tphreg data=splitbcg;

class bcgvacc(ref="0");

model (entryage,exitage)*status(0)=bcgvacc / rl ;

run;

Parameter Standard Hazard 95% Hazard Ratio

Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits

bcgvacc 1 1 -1.08278 0.14046 59.4286 <.0001 0.339 0.257 0.446

47 / 53

Page 48: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Other time-varying covariates

Effect of binary X (0,1) changes at t0:

λi (t) = λ0(t) exp(β1Xi + β2Xi I (t ≥ t0)),

where

I (t ≥ t0) =

{1 if t ≥ t0

0 if t < t0

Can be handled by method I+II.

Effect of binary X (0,1) decreases or increases with time:

λi (t) = λ0(t) exp(β1Xi + β2(Xi · t)).

Can be handled by method I or by ”splitting at each failure” orspecial options.

48 / 53

Page 49: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Stanford Heart Transplant Data (p. 235)

In a report (Crowley and Hu, J Amer. Statist Assoc. 1977) on theStanford Heart Transplantation Study, patients identified as beeneligible (N=103) for a heart transplant were followed until death orcensorship. In total 65 received transplant during follow-up,whereas 38 did not. Assess whether transplanted patients survivebetter. On the next slide you will find the variables in thetransplant data set. Here we will discuss how to analyse and at theexercises we will do some of the analyses.

49 / 53

Page 50: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Stanford Heart Transplant Data variables

age age (in years) at entry into the study.cens 0 = Censoring

1 = Deaddays number of days from entry to dead/censoring.trans 1 = if the person had a heart transplantation

0 = otherwise.wait number of days from entry to transplantation

NB: if trans = 0 then wait = -1mismatch 1 = mismatch between HLA type in donor and patient

0 = no mismatchNB: if trans = 0 then mismatch = -1.

50 / 53

Page 51: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Obs age cens days trans wait mismatch

52 56 1 90 1 27 1

53 53 1 96 1 67 0

54 48 1 100 1 46 0

55 41 1 102 0 -1 -1

56 28 0 109 1 96 1

57 46 1 110 1 60 0

58 23 0 131 1 21 1

59 41 1 149 0 -1 -1

60 47 1 153 1 26 0

61 43 1 165 1 4 0

62 26 0 180 1 13 0

63 52 1 186 1 160 1

64 47 1 188 1 41 0

65 51 1 207 1 139 1

66 51 1 219 1 83 1

67 8 1 263 0 -1 -1

68 47 0 265 1 28 0

69 48 1 285 1 32 1

70 19 1 285 1 57 0

71 49 1 308 1 28 0

51 / 53

Page 52: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

Piecewise Constant Hazard Rate = Poisson regression

Divide the time scale into K pieces and assuming piecewiseconstant but different hazard rates in each of the intervals. Thismay provide a sensible summary of many phenomena and is oftenused in epidemiology.

-λ1 λ2 λ3 · · ·· · ·

λK

c0 = 0 c1 c2 c3 cK−1 cK Age

Thusλ(t) = λk for t ∈ (ck−1, ck ], k = 1, . . . ,K

The intervals do not need to be of same length. We only need tokeep record of the total number of deaths and the exposure time ineach group.

52 / 53

Page 53: ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL …publicifsv.sund.ku.dk/~pka/avepi11/cox.pdf · Cox regression Malignant Melanoma Data In the period 1962-77 a total of 205 patients

Cox regression

We can further divide each interval into categories of covariates,e.g. sex (F=females, M=males):

-λ1M

λ1F

λ2M

λ2F

λ3M

λ3F

· · ·· · ·

λKM

λKF

c0 = 0 c1 c2 c3 cK−1 cK Age

Not straight forward in SAS to split the time-scale, but so-calleduser-written SAS-macros exist. See for example:

http://staff.pubhealth.ku.dk/~bxc/Lexis/Lexis.sas

Stata – use stsplit command.

R – packages exist (e.g. Epi Package)

SPSS – ?

53 / 53