Interval Censored Survival Datarxu/math284/Int_Cens.pdfMultiple imputation The naive approach is a special case of single imputation. Censored data is a type of missing data. For missing

Interval Censored Survival Data

As mentioned in the Introduction lecture, we might:

Observe (U, V ) where T ∈ (U, V )

Eg.1: Time to onset of dementia;

Eg.2: Time to undetectable viral load in AIDS studies;

Both are based on measurements or assessments taken at

clinic visits, while the event happens in-between two consec-

utive visits.

As before, we assume that U and V are independent of

T (conditional on covariates, if any), i.e. non-informative

censoring.

Note that when V = ∞, interval censoring becomes right

censoring where C = U .

R packages: interval, ICsurv.

1

Naive approaches

The naive approaches typically convert interval censored data

to right censored:

for Ui < Vi <∞, take one of the following as Ti:

1. Midpoint: Ti = (Ui + Vi)/2;

2. a random draw from (Ui, Vi) such as according to uni-

form;

3. Interpolation:

if the event is defined by some continuous score A cross-

ing a threshold c, and the scores are taken at Ui and

Vi, then sometimes linear interpolation are used between

Ai(Ui) < c (say) and Ai(Vi) > c.

It is not hard to show that these naive approach are not

valid, but nonetheless they are sometimes used in practice.

2

Multiple imputation

The naive approach is a special case of single imputation.

Censored data is a type of missing data. For missing data

in general there is the multiple imputation (MI) approach.

Specifically, let M be a pre-specified integer, eg. M = 10.

1. Start with an initial estimator S(0)(t), using eg. the mid-

point imputation in the naive approach;

2. At step l+1, impute those Ti ∈ (Ui, Vi) that are interval

censored, according to the estimated distribution S(l)(t),

condition on the fact that Ti ∈ (Ui, Vi) [how]; create M

such imputed data sets.

How: notice that S(l)(t) is likely a discrete distribution

with point masses on (imputed) ‘observed’ event times.

3. For each of the above M imputed data sets, obtain an

estimated Sm(t) for m = 1, ...,M , using methods for

right-censored data (eg. KM). Combine these estimates

to get the updated

S(l+1)(t) =1

M

M∑m=1

Sm(t).

3

4. Repeat steps 2-3 until convergence.

5. At convergence, i.e. the last step, obtain the estimated

variance of θ = S(t):

1

M

M∑m=1

Var(θm) +

(1 +

1

M

)∑Mm=1(θm − θ)2

M − 1,

where Var(θm) is obtained using right-censored data meth-

ods for each imputed data set, eg. Greenwood’s formula

for KM, and the second part above is the sample variance

of the imputed estimates θm’s.

The second part above accounts for the variation among

imputations, i.e. the uncertainty associated with missing

data.

Standard MI consists of steps 2, 3 and 5, without the itera-

tions to convergence.

[See plots, Sun (2006) “The Statistical Analysis of Interval-

censored Failure Time Data” p.40.]

MI has also been proposed for the Cox regression model with

interval censored data (Pan, 2000), and is one of the better

approaches to use in practice.

4

Likelihood for interval censored data

For i = 1, ..., n, the likelihood contribution from subject i is

P (Ui < Ti < Vi) = Fi(Vi)− Fi(Ui).

So the likelihood for i.i.d. data is

L =

n∏i=1

{Fi(Vi)− Fi(Ui)}.

For one-sample problem, all Fi = F .

• If we consider nonparametric MLE, it is clear that the

MLE must assign positive probability mass to each cen-

sored interval (Ui, Vi), or else the likelihood will be zero.

• However, since these intervals in general might overlap

each other, how exactly to assign the probability masses

is more complex.

• In the very special case where these intervals are mu-

tually exclusive, the KM estimator using the naive ap-

proach (say midpoint) is actually valid for assigning the

probability masses.

[See plots.]

• The NPMLE can be computed using EM (Turnbull,

1976), though the asymptotic theory is not fully devel-

oped.

5

Parametric models

It is clear that with the likelihood above parametric models

can be fitted, with standard inference procedures like Fisher

information etc.

Sun (2006) book points out that parametric approaches are

attractive when the censoring intervals are very wide, or sam-

ple sizes are small, as the model supplies the much needed

information.

6

Cox model

Under the Cox model, the likelihood is

L =

n∏i=1

{Si(Ui)− Si(Vi)}

=

n∏i=1

{S0(Ui)

exp(β′Zi) − S0(Vi)exp(β′Zi)

}The log-likelihood is then

l(β, S0) =

n∑i=1

log{S0(Ui)

exp(β′Zi) − S0(Vi)exp(β′Zi)

}.

• Note that this is a full likelihood, and there is no easy

way to ‘proflie’ out the baseline S0;

• Optimization has to be carried out over both β and S0,

after S0 is discretized to the mass points at the observed

Ui and Vi’s;

• The number of parameters for optimization then grows

with the sample size, and can have numerical problems.

7

Rank based approaches were proposed under the Cox model,

to mimic the partial likelihood.

• Just like the nonparametric MLE approach for the one-

sample problem, when the censoring intervals do not

overlap each other, then we have the ranks of the Tieven if we don’t observe their exact values, and the par-

tial likelihood can be written down;

• But when the intervals do overlap, one has to consider

‘admissible rankings’ (Goggins et al., 1998);

• EM approach was proposed, where Markov chain Monte

Carlo (MCMC) needs to be used;

• This approach does not give an estimate S0, so one can-

not predict survival afterwards.

8

Smoothing

Over time a more favored approach seems to be smoothing,

sometimes also referred to as local likelihood, sieves method,

etc.

This is more like parametric models, but allow the number

of parameters to grow with the sample size.

An example is to use B-spline basis functions of order l:{Bj(t)}pnj=1, with knot sequence {ξj}pn+l

j=1 satisfying

0 = ξ1 = · · · = ξl < ξl+1 < · · · < ξpn < ξpn+1 = · · · = ξpn+l = τ,

where pn = O(nκ) for κ < 1.

Let

λ0n(t) =

pn∑j=1

αjBj(t), αj ≥ 0 for j = 1, · · · , pn

The requirement for all coefficients being nonnegative will

guarantee that the smooth hazard function is nonnegative

on [0, τ ].

9

The log-likelihood function is then

ln(β, λ0n) =n∑i=1

log

(exp

[−eβ′Zi

{∫ ui

0

pn∑j=1

αjBj(t)dt

}]

− exp

[−eβ′Zi

{∫ vi

0

pn∑j=1

αjBj(t)dt

}]).

Theoretically (using empirical process tools), the sieve semi-

nonparametric MLE can be shown to be consistent and

asymptotically normal, with Fisher information for the asymp-

totic variance.

Computationally:

• The roughly n1/3 number of knots are selected at the

quantiles of the observed Ui’s and Vi’s;

• The equivalent (to B-spline, via M-spline) monotone I-splines

are used:

M li =

l

ξi+l − ξiBli, I

l+1i =

∫M l

i

where l is the order of the spline.

10

Figure 1: Cubic I-spine (top) and Quadratic M-spline (bottom) basis functions

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

11

• This gives equivalently:

Λ0n(t) =

pn∑j=1

ηjIj(t),

pn∑j=1

ηj ≤ τb0, ηj ≥ mj, for j = 1, · · · , pn,

where mj =ξj+l−ξj

l a0, and a0, b0 are constants.

• The log-likelihood with I-spline basis functions is

ln(β,Λ0n) =n∑i=1

log

(exp

[−eβ′Zi

{pn∑j=1

ηjIj(ui)

}]

− exp

[−eβ′Zi

{pn∑j=1

ηjIj(vi)

}]).

• Since the constraints above are made by linear inequali-

ties, the maximization can be efficiently implemented by

the generalized gradient projection algorithm (Jamshid-

ian, 2004; Wu and Zhang, 2012).

• An advantage of this approach is that one obtains esti-

mates of both β and λ0, the latter being a smooth curve.

12

5 10 15 20

0.00

000.

0005

0.00

100.

0015

Time in weeks

Bas

elin

e ha

zard

●(6.95, 0.00178)

5 10 15 20

0.98

60.

988

0.99

00.

992

0.99

40.

996

0.99

81.

000

Time in weeks

Trun

cate

d su

rviv

al

●

(0.9873, 20)

Figure 2: Estimated baseline hazard (top) and baseline survival function (bottom)for spontaneous abortion (SAB) conditional on having survived 5 weeks of pregnancy,i.e. with left truncation. Time to SAB in gestational age can be interval censored whenthe exact SAB time is unknown, but only a window is available.

13

Current status data

The type of interval censored data we talked about so far are

called general or case II interval censored data.

There is another type called case I, or current status data,

where each subject is only observed once at time C, and

whether the event has happened I(T ≤ C).

It is obvious that this type of data contain less information

than the general interval censored data.

NPMLE and regression model (eg. Cox) have been developed

for current status data, and sieve method is used for the

latter.

14

Interval Censored Survival Datarxu/math284/Int_Cens.pdfMultiple imputation The naive approach is a special case of single imputation. Censored data is a type of missing data. For missing

Documents