01/2015 1 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine, University of Ottawa
Jan 29, 2016
101/2015
EPI 5344:Survival Analysis in
EpidemiologyCox regression: Introduction
March 17, 2015
Dr. N. Birkett,School of Epidemiology, Public Health &
Preventive Medicine,University of Ottawa
201/2015
Objectives
• Review proportional hazards• Introduce Cox model and methods of
estimation• Tied data
301/2015
Exponential model (1R)
• Exponential model– Most common parametric model in epidemiology– Assumes a constant h(t) = λ– How did we create the likelihood function?
• Subjects can have two types of ‘ends’– Death– Censored
• Each contribute to the likelihood function but in different ways
401/2015
Exponential Model (2R)
• Likelihood contribution of a death at time ti:
• Likelihood contribution if censored at time :– Actual time of ‘failure’ is unknown.– Must survive until at least time
– Multiply these across all deaths and all censored events to get full likelihood
501/2015
Exponential Model (3R)
Where:
N = # events
PT = Person-time of
follow-up
601/2015
Exponential Model (4R)• How do we find the MLE for λ?
701/2015
Exponential Model (5R)
• What if we want to examine predictors of the outcome?– λ is allowed to vary by sex, age, cholesterol,
etc.• Use the same approach but now, instead
of ‘λ’, we have the following in the likelihood function:
8
End of review
01/2015
9
Proportional hazard models (1)
• Now, use this approach BUT do not pre-specify form
for h(t)
• We start with proportional hazards
• Hazard (h(t)) = rate of change in survival conditional
on having survived to that point in time.
01/2015
10
Hazard models (2)
• Suppose we want to compare two treatment groups– Different survival is expected they have different hazards
– How can we summarize this?
01/2015
In general, HR(t) will be different at different follow-up times
1101/2015
h2(t)
h1(t)
This can be hard to describe and interpret• Effect of the treatment varies with length of follow-up
1201/2015
h2(t)
h1(t)
• HR could switch from below to above 1.0
13
Hazard models (3)
• SUPPOSE that HR(t) were constant at all follow-up times.– Effect of the treatment is the same at all times
PROPORTIONAL HAZARDS model (PH)
• This does not require that h(t) be constant;• It can vary in an unconstrained manner.
01/2015
1401/2015
h2(t)
h1(t)
1501/2015
h2(t)
h1(t)
HR
1601/2015
1701/2015
Cox models (1)
• For most of the rest of this course, we will assume a Proportional hazards model:
h1(t) = h0(t) * HR
• h0(t) is the ‘baseline’ or reference hazard.– Contains all of the time variability of the hazard.
• HR is assumed to remain the same for all follow-up time.
Constant over follow-up time
1801/2015
Cox models (2)
• HR can still be affected by predictor variables– Race– Exposure (low/mid/high)– Sex– Caloric intake
• For now, we will assume that these are – measured at baseline (time ‘0’)– remain fixed during follow-up
1901/2015
Cox models (3)
• In general, we have:
• Most common model assumes that ln(HR) is a linear function of the predictors. This is similar to the model for logistic regression and linear regression.
• NOTE: there is no intercept!– This is ‘subsumed’ into the baseline hazard term h0(t)
2001/2015
Cox models (4)
• HR model can be written:
• How does the fit into our ‘hazard’ model? Our base model is:
2101/2015
Cox models (5)
• This implies:
• But, so what? How do we estimate the Betas?– As with exponential model, it appears we
need to know the shape of h0(t)
2201/2015
Cox models (6)
• COX (1972) SHOWED THAT THIS IS WRONG!– Can estimate the Beta’s without needing to model h0(t)
– Semi-parametric model– Based on:
• Risk sets• Partial likelihoods
• We will skip a lot of math – Use an intuitive approach– Method relates to approach used with exponential model
2301/2015
Cox models (7)
• Start off trying to build a likelihood for the data based on the whole model (with baseline hazard included)
• Concentrate on the times when events happened– Similar to the Kaplan-Meier method
• S(t) only changes when an event happens• can ignore losses between events
• Action happens within Risk Set at the event times.
2401/2015
Cox models (8)
• Action happens within Risk Set at the event times.
• The theory assumes that only one event happens at any point in time– This is not the ‘real world’– In theory, time is continuous.
• So no two events happen at the same time
– We’ll deal with ‘ties’ later on
2501/2015
Cox models (9)
• Consider the risk set at time ‘ti’ when an event happens– Each subject in risk set has a probability of
being the one having the event• Higher hazard higher probability
• ‘likelihood’ contribution from person ‘j’ in risk set is:
2601/2015
Cox models (10)
• Using the definition of conditional probability, this is:
• How do we get the numerator and denominator?
• The hazard is a measure of how likely an event is to occur for a person– Higher hazards an event is more likely
2701/2015
Cox models (11)
• We can get:
2801/2015
Cox models (12)
Now, because the hazards are proportional, we have:
29
Cox models (13)
• The likelihood contribution from this event (risk set) can be written:
Cancel out the h0(t)
01/2015
30
Cox models (14)
• The final likelihood contribution from this risk set is:
• Which does not depend on h0(t)
01/2015
3101/2015
Cox models (15)
• Now, multiply all of the contributions from each risk set (defined when an event occurs)
• Produces a Partial Likelihood• Estimate the Betas using MLE.
3201/2015
Cox models (16)
• We can ignore censored times since we are not estimating the actual hazard
• Beta’s depend only on the ranking of events, not on the actual event times– Implies that Cox does not give the same
estimates as Person-time epidemiology analyses
– Standard Cox models do not estimate survival, just relative survival
3301/2015
D
D
D
C
C
t1 t2t3
Let’s consider a simple example.• Three events three risk sets to consider
34
For subject ‘m’, the hazard function is:
1st event.
risk set: 1/2/3/4/5
Subject with event: 3
Likelihood contribution:
01/2015
35
But, we have:
So, likelihood contribution from risk set #1 is:
01/2015
36
Extending this to the other risk sets:
2nd event.
risk set: 1/2/4
Subject with event: 1
Likelihood contribution:
3nd event.
risk set: 4
Subject with event: 4
Likelihood contribution:
01/2015
37
Overall Partial Likelihood is:
This can easily be extended to very large data sets.
• Writing out the entire partial likelihood function would be ‘crazy’
• But, this is what our computer has to do
01/2015
38
Suppose that we are using the Cox model. Let’s also limit to one predictor. Then, we have:
• Partial Likelihood form is now:
• We will see this layout again
01/2015
39
‘Ties’ (1)
• Above assumed that only one event
happened at any given time– True ‘in theory’ because time is a continuous
variable.
– No true in reality because time is measured
‘coarsely’.• For example
– Only get measurement data every year
– Time of event measured to the day, not hour/min/second
01/2015
40
‘Ties’ (2)
• More than one event at the same time is called a ‘tied’ event.
• How do we modify the method to handle tied event times?
01/2015
41
‘Ties’ (3)
• Two main approaches to ‘ties’– Discrete models
• Change the basic theory underlying the model• Assumes that event times are discrete points• Relates to logistic regression• Useful when event time can only occur at fixed
points– graduation from high school
– Exact method• Often implemented using an approximation.
01/2015
42
‘Ties’ (4)
• Exact method– Suppose we have two events (s1 & s2) which occur
at the same time due to imprecise measurement of the event time.
– IF we had been able to measure the event time with enough precision, we would know if s1
occurred first or second• Birth of twins
– We don’t know, so we assume that the two possibilities are equally likely.
01/2015
43
‘Ties’ (5)
• Suppose s1 occurred before s2.– Likelihood contribution would be:
• Suppose s2 occurred before s1.– Likelihood contribution would be:
01/2015
44
‘Ties’ (6)
• Don’t know order. Each is equally likely.• Overall likelihood contribution is:
01/2015
45
‘Ties’ (7)
• A bit messy but not too bad.
• However, consider the recidivism data.– 5 arrests occurred in week 8
– We don’t know which order they occurred in
– 120 potential orders (= 5!)
– Each order contributes a likelihood product with 5 terms
– Need to add up 120 of these products to give ONE
contribution.
• Can rapidly get even worse!01/2015
46
‘Ties’ (8)
• Computationally demanding– Not that big a task for modern computers
• Two approximate methods have been developed– Breslow
– Efron
• Both are ‘OK’ as long as number of ties is not too big– Efron is better.
• With modern computers, using the exact approach is
likely fine.01/2015
47
‘Ties’ (9): Summary
01/2015
Situation Comment
No ties • All methods give the same results
A few ties (<2%) • All methods give similar results
Many ties • Approximations are all biased towards ‘0’.• Prefer Efron to Breslow.• Exact methods are best but be careful
about computational demands
SAS default method is Breslow
4801/2015