Definitions and examples Non-parametric estimation Parametric models Non-parametric testing Delayed entry Miscellaneous Introduction to survival analysis Per Kragh Andersen Section of Biostatistics, University of Copenhagen DSBS Course Survival Analysis in Clinical Trials January 2018 1 / 65
65
Embed
Introduction to survival analysispublicifsv.sund.ku.dk/~pka/SACT18-part1/intro18.pdfIntroduction to survival analysis Per Kragh Andersen Section of Biostatistics, University of Copenhagen
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Obviously, one can also consider the mean life time
E (T ) = µ =
∫ ∞0
S(t)dt.
This, however, depends critically on the right-hand tail of thedistribution of T which we typically do not see because ofcensoring.Some times one studies the restricted mean life time
E (T ∧ τ) = µ(τ) =
∫ τ
0S(t)dt.
However, the interpretation is less nice (average time lived beforetime τ) and its value (obviously!) depends on the choice of τ .
We are used to considering our data as a sample from some(target) population, and the parameters refer to this population.
That is no different in survival analysis, however, it is important torealize that the target population is a complete population, i.e.,without censoring.
Our ambition in survival analysis is therefore to draw inference onparameters like the survival function S(t) or the hazard functionλ(t) from a potentially completely observed population based onincomplete (censored) data.
This is quite ambitious and requires certain assumptions.
Requirement 2 is the assumption of independent censoring (bysome denoted non-informative censoring).
This means that individuals censored at any given time t shouldnot be a biased sample of those who are at risk at time t.
Stated in other words: the hazard function λ(t) gives the eventrate at time t, i.e. the failure rate given that the subject is stillalive (T > t).
Independent censoring then means that the extra information thatthe subject is not only alive, but also uncensored at time t doesnot change the failure rate.
Typically, independent censoring cannot be tested from theavailable data - it is a matter of discussion.
Censoring caused by being alive at the end of study can usuallysafely be taken to be “independent”. However, one should be moresuspicious to other kinds of loss to follow-up before end of study.
It is strongly advisable always to keep track of subjects who arelost to follow-up and to note the reasons for loss to follow-up (e.g.,drop-out of follow-up schedule or emigration).
The above discussion of independent censoring should be thoughtof as ‘for given covariates’. This means that censoring may dependon covariates as long as these covariates are accounted for in thehazard model (e.g., using the Cox regression model).
Multi-centre randomized trial in patients with primary biliarycirrhosis.
Patients (n = 349) recruited 1 Jan, 1983 - 1 Jan, 1987 from sixEuropean hospitals and randomized to CyA (176) or placebo (173).
Followed until death or liver transplantation (no longer than 31 Dec,1989); CyA: 30 died, 14 were transplanted; placebo: 31 died, 15were transplanted; 4 patients were lost to follow-up before 1989.
Primary outcome variable: time to death, incompletely observed(right-censoring), due to: liver transplantation, loss to follow-up,alive 31 Dec, 1989
In some analyses, the outcome is defined as “time to failure ofmedical treatment”, i.e. to the composite end-point of either deathor liver transplantation
Notation:Distinct failure or censoring times: 0 < t1 < t2 < ...Number of failures observed at those times d(t1), d(t2), ...(NB: these are typically 0 or 1.)Number of subjects at risk at (i.e., just before) those times:Y (t1),Y (t2), ...
For t > s: to survive beyond t one must first survive beyond s!That is,
The standard error (SD) of the Kaplan-Meier estimator may beestimated by Greenwood’s formula:
SD(S(t)) = S(t)
√√√√∑tj≤t
d(tj)
Y (tj)(Y (tj)− d(tj) + 1).
To get an approximate 95% confidence interval for S(t), one mayuse simple linear limits S(t)± 1.96 · SD(S(t)).However, to eliminate problems with range restrictions when S(t)is close to 0 or 1, transformations (i.e., using the delta-method)may be used, e.g. the log(− log) transformation, which leads tothe interval
(S(t))a ≤ S(t) ≤ (S(t))b,
where b = 1/a and a = exp(1.96 · SD(S(t))/(− log(S(t)))).
This estimator of the integrated hazard function Λ(t) builds on thesame idea as the Kaplan-Meier estimator: estimate
λ(t)dt ≈ P(T ≤ t + dt | T > t) byd(tj)
Y (tj)when t = tj .
That is,
Λ(t) =∑tj≤t
d(tj)
Y (tj).
Further,
SD(Λ(t)) =
√√√√∑tj≤t
d(tj)
(Y (tj))2.
Note how censored observations are used for both K-M and N-Aa:a subject censored at tj gives rise to no jump in the estimator butcontributes to the size, Y (t) of the risk set for t ≤ tj .
Why not estimate Λ(t) by − log(S(t)) or S(t) by exp(−Λ(t))?This is because the relation S(t) = exp(−Λ(t)) holds forabsolutely continuous distributions and our estimators are discretedistributions.For discrete distributions, the relationship between cumulativehazard (measure) and survival function is given by theproduct-integral:
S(t) =∏u<t
(1− dA(u)
)and S(t) is, indeed, the product-integral of Λ(t).Properties of the K-M estimator follow from those of N-Aa via thisrelationship (the product-integral is a continuous and differentiablemapping).In practice, it makes little difference using S(t) or exp(−Λ(t)).
If the option NELSON (or AALEN) is not specied then the ‘-log(KM)’estimator is obtained.The option METHOD=CH (or BRESLOW) is default and, hence, notneeded.
Non-parametric inference (including the Cox model - more later)has become the standard method in survival analysis.
However, useful parametric models do exist:The exponential distribution with constant hazard: λ(t) = λ for all t. This is arestrictive assumption which is often not justified, however, this is the modelunderlying the calculation of simple ‘occurrence/exposure’ rates.
Piecewise exponential models have piecewise constant hazards: λ(t) = λj whensj−1 ≤ t < sj for pre-specified intervals, 0 = s0 < s1 < ... < sJ =∞. This leadsto interval-specific occurrence/exposure rates and provides the basis for Poissonregression models
Another simple extension of the exponential model is the Weibull model withλ(t) = λαtα−1. Mathematically simple, rather flexible (e.g., both increasing,constant, and decreasing hazard functions), but rarely used in practice.
log-normal models also exist (no simple hazard function)
Likelihood function when hazard function is λθ(t):
L(θ) =n∏
i=1
(λθ(Ti )
)Di exp(−∫ Ti
0λθ(t)dt
).
Standard inference via score function, observed information etc.Martingale-based proof of ‘standard’ asymptotic properties: thescore D log L(θ0) is a martingale at the true parameter value θ0.
When the full distribution of T is parametrically specified via θ,parameters like mean and median are also a function of θ.However, since the right-hand tail of the distribution is notobserved because of censoring, one is reluctant to quote the mean.
The hazard function is λ(t) = λj when sj−1 ≤ t < sj forpre-specified intervals, 0 = s0 < s1 < ... < sJ =∞.The maximum likelihood estimator is most easily expressed incounting process notation:
N(t) =∑i
I (Ti ≤ t,Di = 1), Y (t) =∑i
I (Ti ≥ t).
Then
λj =N(sj )−N(sj−1)∫ sj
sj−1Y (t)dt
,
i.e., number of failures in interval j divided by the total time at riskin interval j . Further, from the observed information:
We want to compare hazard functions λ1(t) and λ2(t) in twogroups.
Counting process notation: In group j we have: Nj(t) = number ofobserved events in [0, t], Yj(t) = number at risk just before time t.Nelson-Aalen estimators for Λj(t) =
∫ t0 λj(u)du:
Λj(t) =
∫ t
0
I (Yj(u) > 0)
Yj(u)dNj(u), j = 1, 2.
Idea in general test statistic: look at K -weighted differencesbetween increments in Nelson-Aalen estimators:
The logrank test (as we shall see later) has optimality propertiesagainst proportional hazards alternatives:
λ2(t) = θλ1(t).
Using instead weights given by K (t) = Y1(t)Y2(t), a test statisticis obtained where values of ‘observed - expected’ at earlier timepoints are given larger weight.This test statistic, in fact, when there are no censored observationsis the two-sample Wilcoxon (Mann-Whitney) test.
For either choice of K (·), the statistic (U(∞))2, properlynormalized, is referred to the χ2
1−distribution.
The logrank test has developed into the test of choice, and anypaper using a different test will be looked upon with suspicion.
Doob-Meyer decomposition for each j = 1, 2 (Mj is a martingale):
Nj(t) =
∫ t
0Yj(u)λj(u)du + Mj(t).
U(t) =
∫ t
0K (u)
(dN1(u)/Y1(u)− dN2(u)/Y2(u)
)Using the decomposition under H0 : λ1(t) = λ2(t) we see that
U(t) =
∫ t
0K (u)
(dM1(u)/Y1(u)− dM2(u)/Y2(u)
)is a martingale, i.e. E (U(t)) = 0 and the asymptotic distribution(a normal distribution) together with the normalizing variance canbe found by a martingale CLT.
Comparison of two groups (say, Z = 1 and Z = 0) afteradjustment for a categorical variable (X ) can be performed usingthe stratified logrank test.Here, observed and expected numbers of failures (e.g., for Z = 1)are first computed within strata given by values of X and,subsequently, added across strata.
In SAS X should be the STRATA variable and the stratified teststatistic is obtained using a TEST Z; command.
Some times, subjects are not observed from time 0 but only from alater entry time, Vi , that is, subject i is only observed conditionallyon having survived until Vi .
This is denoted delayed entry or left truncation and is often presentif age is the primary time variable.
A change of time variable causing delayed entry changes how risksets are composed - see graph for the small data set.
So far, we have focussed on models for the hazard function andestimation of hazard ratios (which then implied models forS(t) = 1− F (t)).Other parameters may be targeted:
The risk difference in τ :
F1(τ)− F2(τ).
The τ−restricted mean life time:
E (T ∧ τ) =
∫ τ
0S(t)dt.
log-linear models for T , accelerated failure time models
Without covariates, this can be estimated from the K-M-estimatorS2(τ)− S1(τ).From a Cox model with treatment variable Z and other covariatesX , the risk difference at τ between treatment groups could beestimated by direct adjustment/standardization:
1
n
(∑i
S(τ | Z = 1,Xi )−∑i
S(τ | Z = 0,Xi )).
This is also known as the g-formula in (modern) causal inference.
What if we want direct covariate effects for F (τ) (instead ofindirectly via the hazard function)?
Let S be the K-M-estimator and S (−i) the same estimator appliedto the data set (of size n − 1) obtained by eliminating subject i .Then the pseudo-observation for the (possibly incompletelyobserved) survival indicator I (Ti > τ) is:
Si (τ) = n · S(τ)− (n − 1) · S (−i)(τ).
This may be used as response variable for a generalized linearmodel
g(S(τ | X )) = α0 + α1X1 + ...+ αpXp
and parameters may be estimated by solving the generalizedestimating equations (GEE, with working (co-)variance Vi ):
pseudo-observation is then simply Si (τ) = I (Ti > τ).
With censoring (NB: should be independent of covariates)
E (Si (τ) | X ) ≈ E (I (Ti > τ) | X ),
that is, the pseudo-observation has approximately the correctconditional expectation given covariates, and the GEE areunbiased.
The estimating equations may be solved using standard software(e.g., SAS PROC GENMOD), and there is a SAS MACRO available forcomputing the pseudo-observations.
The restricted mean life time is having a revival in survivalanalysis, e.g. Royston and Parmar (BMC Med. Res. Meth., 2013).
How to do regression? As for the risk difference, one may useplug-in based on a Cox model or (as promoted by R& P) based onsome flexible parametric model.
An alternative is to use pseudo-observations. Compute:
µi (τ) = n · µ(τ)− (n − 1) · µ(−i)(τ)
and use them as responses in GEE (working (co-)variance Vi ):
U(α) =∑i
∂
∂α(g−1(αTXi ))V−1
i (µi (τ)− g−1(αTXi )) = 0
when fitting a generalized linear model:g(µ(τ | X )) = α0 + α1X1 + ...+ αpXp. 64 / 65
Survival analysis deals with a quantitative non-negative responsevariable Ti and, therefore, a regression model of choice could be
log(Ti ) = α0 + α1X1 + ...+ αpXp + σεi .
Such models do, indeed, exist and mainly parametric models areused where the error term εi , is assumed to follow a specifieddistribution.
The parameters have nice interpretations as acceleration factors:for a binary X , ‘time moves exp(α) faster’ for X = 1 compared toX = 0.The model may be fitted using SAS PROC LIFEREG which offerschoices of normal, logistic, and extreme value distributions.The latter corresponds to a Weibull distribution for T and is theonly model that is both an AFT and a proportional hazards model(Cox model with Weibull baseline).