1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008
1
Introduction to medical survival analysis
John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008
2
Objectives • Describe survival data • Define survival analysis terms • Compare survival of groups • Describe study design
Acknowledgement: Thanks to Colm Fahy for providing the example data.
3
Omissions • Not covered:
– most methodology issues – mathematical justification
• See – Collett: Modelling Survival Data in Medical
Research – Hosmer & Lemeshow: Applied Survival
Analysis – Many other good texts.
4
Example: Metastatic Parotid SCC
• Disease risk factors: – >50 yo – Male – Exposure to sun – Caucasian ancestry
• 61 patients operated on since 1990 • Audit done 1/6/8 • 14 patients died from SCCMP, 20 died
from other causes, 1 couldn’t be found
5
Example: Patient data OpDate Died Status Preserved RadioTx ICOMP
7/05/2002 ALIVE PARTIAL YES N15/11/2007 ALIVE NO YES N12/10/2007 1/03/2008 DOC YES YES N17/04/1992 1/08/1993 DOD YES YES Y7/10/1996 1/04/1997 DOC NO YES N1/05/1991 LOST YES YES N
12/03/2003 1/05/2005 DOC YES YES Y
Only 7 patients shown. Dates have been confidentialized.
6
Example: Patient data
1
2
3
4
5
6
7
1990 1995 2000 2005 6/2008
Parotidectomy patient medical records
Pat
ient
AliveDead OCDead OD
? Lost to follow up audi
t
7
Example: Patient data
1
2
3
4
5
6
7
1990 1995 2000 2005 6/2008
Parotidectomy patient medical records
Pat
ient
AliveDead OCDead OD
? Lost to follow up audi
t
?
?
8
Example: Survival Data
1
2
3
4
5
6
7
0 5 10 15
Parotidectomy patient survival data
Pat
ient
?
AliveDead OCDead OD
Years post operation
9
Example: Survival Data
Date formats and manipulation can cause headaches. Check what happens when your software subtracts dates to get survival time.
1
2
3
4
5
6
7
0 5 10 15
Parotidectomy patient survival data
Pat
ient
?
AliveDead OCDead OD
Years post operation
10
1
2
3
4
5
6
7
0 5 10 15
Parotidectomy patient survival data
Pat
ient
?
AliveDead OCDead OD
Years post operation
Example: Survival Data
censored
censored
Missing data
11
1
2
3
4
5
6
7
0 5 10 15
Parotidectomy patient survival data
Pat
ient
?
AliveDead OCDead OD
Years post operation
Example: Survival Data
censored
censored
Missing data
censored
censored
censored
12
1
2
3
4
5
6
7
0 5 10 15
Parotidectomy patient survival data
Pat
ient
?
AliveDead OCDead OD
Years post operation
Example: Survival Data
censored
censored
censored
Missing data
censored
Censored data is explicitly addressed by survival analysis, using simple linear regression is not recommended. Options: 1. SPSS 2. SAS 3. R 4. Other software
13
1
2
3
4
5
6
7
0 5 10 15
Parotidectomy patient survival data
Pat
ient
?
AliveDead OCDead OD
Years post operation
Example: Survival Data
censored
censored
censored
Missing data
censored
Missing data can have a large effect on results, requires careful management. Options: 1. Omit 2. Impute 3. Model
14
What is survival analysis • Time to event data
– Continuous – Right skewed, ≥0, not normal – Censored – Analyse risk (hazard function)
• Examples – Time to death – Time to onset/relapse of disease – Length of stay in hospital
15
What is survival analysis • Time to event data
– Continuous – Right skewed, ≥0, not normal – Censored – Analyse risk (hazard function)
• Examples – Time to death – Time to onset/relapse of disease – Length of stay in hospital
0
5
10
15
0 2 4 6 8 10
Post operative survival
Pat
ient
s
Years
16
Censoring • Right censoring • Left censoring • Interval censoring
Censoring is also categorised by 1. Fixed study length 2. Fixed number of events 3. Random entry to study
17
Censoring • Right censoring
– observed survival time is less than actual – Study ends before event
• Left censoring • Interval censoring
1
2
3
4
5
6
7
1990 1995 2000 2005 6/2008
Parotidectomy patient medical records
Pat
ient
AliveDead OCDead OD
? Lost to follow up audi
t
?
?
18
Censoring • Right censoring • Left censoring
– Time to relapse
– Time to event is less than observed t < 3 • Interval censoring
Surgery
0
Recurrence
3 month exam t
19
Censoring • Right censoring • Left censoring • Interval censoring
– Time to relapse
– 3 < t < 6
Surgery
0
Free of disease
3 month exam t
Recurrence
6 month exam
20
Censoring
Independent censoring
Survival time is independent of censoring process. A censored patient is representative of those at risk at censoring time. The methods described here assume independent censoring
21
Censoring
Independent censoring
Survival time is independent of censoring process. Informative censoring
Patients removed from study if condition deteriorates.
22
Censoring example How are the SCCMP patients censored?
23
Censoring example How are the SCCMP patients censored? • Enter study on surgery date • Last known status is at audit Random right censoring.
24
Survival function The survival function S(t) is the probability of surviving longer than time t.
S(t) = P(T>t)
Where T is the survival time.
patients of number total than longer surviving patients of Number t
S(t)
25
Hazard function The hazard function λ(t) is the probability of dying “at” time t.
Also called the instantaneous failure rate and force of mortality.
S(t)
f(t)(t)
)(log tS(t)Usually plotted is the cumulative hazard function, that is the accumulated hazard until time t.
26
Survival function For censored data the survival function can only be estimated.
1
2
3
4
5
6
7
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Parotidectomy patient survival data
Pat
ient
AliveDead OCDead OD
Years post operation
27
Survival function Life table estimates WHO, StatsNZ
All causes mortality
0
20
40
60
80
100
0 10 20 30 40 50 60 70 80 90 100
Age
Perc
en
t su
rviv
ing
NZ
Australia
Chad
28
Survival function Kaplan Meier estimates
Months n d (n-d)/n S(t)
1 2.2 57 1 0.982 0.9822 6.12 51 1 0.980 0.9633 10.32 46 1 0.978 0.9424 10.78 45 1 0.978 0.9215 10.88 44 1 0.977 0.96 13.08 41 1 0.976 0.8787 13.35 39 1 0.974 0.8568 16.11 37 1 0.973 0.8339 26.2 34 1 0.971 0.808
10 29.42 31 1 0.968 0.78211 37.48 26 1 0.962 0.75212 45.86 23 1 0.957 0.71913 59.08 19 1 0.947 0.68214 65.33 14 1 0.929 0.633
29
Survival function Kaplan Meier estimates
Months n d (n-d)/n S(t)
1 2.2 57 1 0.982 0.9822 6.12 51 1 0.980 0.9633 10.32 46 1 0.978 0.9424 10.78 45 1 0.978 0.9215 10.88 44 1 0.977 0.96 13.08 41 1 0.976 0.8787 13.35 39 1 0.974 0.8568 16.11 37 1 0.973 0.8339 26.2 34 1 0.971 0.808
10 29.42 31 1 0.968 0.78211 37.48 26 1 0.962 0.75212 45.86 23 1 0.957 0.71913 59.08 19 1 0.947 0.68214 65.33 14 1 0.929 0.633
1. Order data by time to event (death) 2. Number at risk of
event is number surviving less number censored.
3. Estimate of probability of surviving to next event
4. Multiply probabilities to estimate survival
30
Kaplan Meier plot
0.0
0.2
0.4
0.6
0.8
1.0
0 20 40 60 80 100 120
Kaplan Meier estimate
Est
imat
ed s
urvi
vor f
unct
ion
Months
31
Kaplan Meier plot SCCMP
0.0
0.2
0.4
0.6
0.8
1.0
0 20 40 60 80 100 120
Kaplan Meier estimate
Est
imat
ed s
urvi
vor f
unct
ion
Standard errors and 95% CI’s calculated by most software (SPSS, R, SAS)
Usually use Greenwood’s or Tsiatis’ formula, software dependent.
32
Cumulative Hazard SCCMP
0.0
0.1
0.2
0.3
0.4
0 20 40 60 80 100 120
Cumulative Hazard Function
Cum
ulat
ive
haza
rd
Months
33
Summary statistics 1. Median survival: time when S(t) = 0.5
• Must have enough data 2. Mean survival: area under the survival
curve 3. 5 year survival is survival rate at 5 years
34
Kaplan Meier estimate KM and lifetables are non-parametric methods: no assumptions are made about the distribution on the survival times. Typical distributions are exponential and Weibull. More powerful but can be sensitive to getting the distribution right.
35
Disease specific survival
0.0
0.2
0.4
0.6
0.8
1.0
0 20 40 60 80 100 120
SCCMP survival
Est
imat
ed s
urvi
vor f
unct
ion
Months
Disease specificAll causes
36
Comparing 2 groups Log rank test • Computed in SPSS, SAS, R • Most popular
– (Bland Altman BMJ 2004;328:1073 (1 May) • Limitations
– No estimate of size – Unlikely to detect a difference when risk is not
consistent
37
Immuno compromised
0.0
0.2
0.4
0.6
0.8
1.0
0 20 40 60 80 100 120 140
SCCMP survival: Immuno Compromised
Est
imat
ed s
urvi
vor f
unct
ion
Months
No
Yes
38
Immuno compromised
0.0
0.2
0.4
0.6
0.8
1.0
0 20 40 60 80 100 120 140
SCCMP survival: Immuno Compromised
Est
imat
ed s
urvi
vor f
unct
ion
Months
No
Yes
Case Processing Summary
53 9 44 83.0%7 5 2 28.6%
60 14 46 76.7%
ICOMPNYOverall
Total N N of Events N PercentCensored
39
Immuno compromised
0.0
0.2
0.4
0.6
0.8
1.0
0 20 40 60 80 100 120 140
SCCMP survival: Immuno Compromised
Est
imat
ed s
urvi
vor f
unct
ion
Months
No
Yes
Means and Medians for Survival Time
101.048 7.616 . .22.978 7.653 16.110 3.29391.761 7.842 . .
ICOMPNYOverall
Estimate Std. Error Estimate Std. Error
Meana Median
Estimation is limited to the largest survival time if itis censored.
a.
40
Immuno compromised
0.0
0.2
0.4
0.6
0.8
1.0
0 20 40 60 80 100 120 140
SCCMP survival: Immuno Compromised
Est
imat
ed s
urvi
vor f
unct
ion
Months
No
Yes
Overall Comparisons
19.579 1 .000Log Rank (Mantel-Cox)Chi-Square df Sig.
Test of equality of survival distributions for the different levels ofICOMP.
41
Age group
0.0
0.2
0.4
0.6
0.8
1.0
0 20 40 60 80 100 120 140
SCCMP survival: Age group
Est
imat
ed s
urvi
vor f
unct
ion
Months
75+<75
Call:
survdiff(formula = Surv(mths,Status == "DOD") ~ ICOMP)
N Observed Expected (O-E)^2/E (O-E)^2/V
Age75=<75 24 7 5.63 0.332 0.557
Age75=75+ 36 7 8.37 0.224 0.557
Chisq= 0.6 on 1 degrees of freedom, p= 0.455
42
Facial Nerve
0.0
0.2
0.4
0.6
0.8
1.0
0 20 40 60 80 100 120 140
SCCMP survival: Facial Nerve Preserved
Est
imat
ed s
urvi
vor f
unct
ion
Months
NO
PARTIAL
YES
Log rank p value: 0.09
43
Multiple independent variables Cox proportional hazards model • Most common model • Linear model for the log of the hazard ratio
• Baseline hazard unspecified
2211
)(
)(
0
1 ZBZBe
th
th
44
SCCMP example CPH model: Survival ~ Preserved + Age + ICOMP Preserved and ICOMP categorical Age continuous Plot survival for patients with each of /Y/N/partial nerve preservation adjusted for age and immuno compromised status
45
SCCMP example - SPSS
Analyze > Survival > Cox Regression COXREG Months /STATUS=Status('DEAD') /PATTERN BY Preserved /CONTRAST (Preserved)=Indicator /CONTRAST (ICOMP)=Indicator(1) /METHOD=ENTER Preserved Age ICOMP /PLOT SURVIVAL /SAVE=PRESID XBETA /PRINT=CI(95) CORR SUMMARY BASELINE /CRITERIA=PIN(.05) POUT(.10) ITERATE(20) .
46
SCCMP example - SPSS
Patients with their facial nerve preserved have 12.6 times less hazard ratio, (95% CI 2-70) . Preserving the facial nerve significantly reduces patients risk, (p value <0.001 CPH model).
Variables in the Equation
8.493 2 .0142.535 .871 8.470 1 .004 12.617 2.288 69.5642.091 1.110 3.549 1 .060 8.093 .919 71.2793.588 .918 15.274 1 .000 36.166 5.981 218.676-.011 .028 .149 1 .700 .989 .936 1.046
Preserved No PartialICOMPAge
B SE Wald df Sig. Exp(B) Lower Upper95.0% CI for Exp(B)
47
SCCMP CPH model
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40 50 60 70
SCCMP survival: Facial nerve preserved
Est
imat
ed s
urvi
vor f
unct
ion
Months
NO
PARTIAL
YES
Adjusted for age and immuno compromised patients
48
Next Steps:
• Check proportional hazards assumption – Residual plots for groups
• Time dependent covariates • More complex models
• we also didn’t do power calculations
49
Summary
• Survival analysis accounts for censoring in time to event data
• Log rank test: difference in survival between 2 groups
• Cox proportional hazard model • More complex/powerful models available • SPSS, R, SAS, Stata