Methods for Interval-Censored Failure Time Data and Beyond II · Analysis of Interval-censored Data and Beyond II ’ & $ % II. Analysis of Doubly Censored Data II.1. An Example |

Analysis of Interval-censored Data and Beyond II'

&

$

%

Methods for Interval-Censored Failure Time

Data and Beyond II

(Tony) Jianguo Sun

Department of Statistics, University of Missouri

September 19, 2008

Department of Statistics, University of Missouri Page 1


&

$

%

OUTLINE

• I. Analysis of Bivariate Interval-Censored DataI.1. An Example — AIDS Clinical TrialI.2. Nonparametric Maximum likelihood EstimationI.3. Estimation of the Association ParameterI.4. Regression Analysis

• II. Analysis of Doubly Censored DataII.1. An Example — AIDS Cohort StudyII.2. Nonparametric Estimation of Survival FunctionsII.3. Nonparametric Comparison of Survival FunctionsII.4. Regression Analysis

• III. Other Topics and Future ResearchIII.1. Analysis with Informative Interval CensoringIII.2. Bayesian Analysis of Interval-Censored DataIII.3. Some Future Research Directions



&

$

%

I. Analysis of Bivariate Interval-Censored Data

I.1. An Example — AIDS Clinical Trial

• Subjects: 204 HIV-infected individuals

• Study: a substudy of a comparative clinical trial of three

anti-pneumocystis drugs w.r.t. the opportunistic

infection cytomegalovirus (CMV)

• Variables of interest: times to the presence of CMV

in blood and urine

• Observations: blood and urine samples were collected

and tested every 4 or 12 weeks

• Bivariate interval-censored data

• Goggins and Finkelstein (2000).



&

$

%

Table 1: Observed intervals in weeks for blood and urine shedding times alongwith the baseline CD4 status from ACTG 181

ID LB RB LU RU CD4 ID LB RB LU RU CD4

1 11 - 11 - 1 45 6 10 0 2 1

2 11 - 11 - 1 46 2 - 2 - 1

3 11 - 11 - 0 47 13 - 13 - 0

4 11 - 8 10 0 48 15 - 0 3 0

5 7 - 6 8 1 49 8 - 0 1 0

6 11 - 12 - 0 50 16 - 6 9 0

7 8 12 8 10 1 51 5 - 0 1 1

8 10 - 10 - 0 52 2 - 0 1 1

9 6 - 6 - 1 53 13 - 0 1 0

10 2 9 9 11 0 54 13 - 13 - 1

...... ......



&

$

%

I.2. Nonparametric Maximum Likelihood Estimation

• Consider a survival study giving bivariate interval-censored data

{Ui = (L1i, R1i] × (L2i, R2i], i = 1, ..., n}.

• Let F (t1, t2) = P (T1i ≤ t1, T2i ≤ t2) denote the cdf and

H = {Hj = (r1j, s1j] × (r2j, s2j], j = 1, ...,m}

the disjoint rectangles that constitute the regions of

possible support of the NMLE of F .

• Define pj = F (Hj) and αij = I(Hj ⊆ Ui).



&

$

%

• Then the likelihood function has the form

L(p) =n∏

i=1

m∑

j=1

αij pj

and the NMLE of F can be obtained by maximizing L(p)

subject to∑m

j=1 pj = 1 and pj ≥ 0 for all j.

• How to determine H:

Betensky and Finkelstein (1999),

Gentleman and Vandal (2001, 2002),

Bogaerts and Lesaffre (2004).



&

$

%

I.3. Estimation of the Association Parameter

— A copula model approach

• Consider a survival study giving bivariate interval-censored data

for (T1, T2) whose joint survival function is given by

S(t1, t2) = Cα(S1(t1), S2(t2)) ,

where α is a global association parameter.

• Let l(α, S1, S2) denote the log likelihood function and

S1 and S2 the marginal MLE of S1 and S2. Then one can

estimate α by solving the equation

∂l(α, S1, S2)

∂α= 0 .

• Wang and Ding (2000), Sun, Wang and Sun (2006).



&

$

%

— An imputation approach

• Instead of the copula model approach, one can directly estimate

the Kendall’s τ defined as

τ = P{(T1i −T1j)(T2i −T2j) > 0}−P{(T1i −T1j)(T2i −T2j) < 0}.

• Step 1: Estimate the joint survival function of (T1, T2).

• Step 2: Impute the exact failure times M times.

• Step 3: Calculate the empirical Kendall’s τ for each of M sets

of the imputed data.

• Step 4: Estimate the Kendall’s τ by the average of the empirical

estimates.

• Betensky and Finkelstein (1999).



&

$

%

I.4. Regression Analysis

— Observed data and models

• Consider a survival study involving K possibly correlated

failure times (T1, · · · , TK) and n independent subjects.

• Assume that for each Tk, only an interval (Lk, Rk] is observed,

giving Tk ∈ (Lk, Rk]. So the observed data have the form

{ (L1i, R1i] , ..., (LKi, RKi], Zi ; i = 1, ..., n }.

• Let λk(t;Z) and Sk(t;Z) denote the marginal hazard and

survival functions of Tk given covariates Z, respectively.



&

$

%

• The PH model:

λk(t;Z) = λk0(t) exp(Z ′ β) .

• The PO model:

Sk(t;Z)

1 − Sk(t;Z)= e−Z′β Sk(t;Z = 0)

1 − Sk(t;Z = 0),

logit[Sk(t;Z)] = logit[Sk0(t)] − Z ′ β .

• The AH model:

λk(t;Z) = λk0(t) + Z ′ β .



&

$

%

— A marginal inference procedure

• Assume that the Tk’s are discrete variables. For the analysis,

note that if T1, ..., TK are independent, the log-likelihood

is proportional to

l(β,A1, ..., AK) =K∑

k=1

n∑

i=1

log {Lik(β,Ak)} ,

where Lik denotes the marginal likelihood on Tk from subject i,

Ak(t) = Sk0(t)/{1 − Sk0(t)} for the PO model, or

Ak(t) =∫ t0 λk0(s) ds for the PH or AH model.

• Thus one can estimate β and Ak’s by maximizing

l(β,A1, ..., AK).



&

$

%

• Let β denote the estimate of β defined above. Then

under certain conditions, β is consistent and for large n, one can

approximate its distribution using the normal distribution

with the covariance matrix consistently estimated by

I−1(β, Ak)D(β, Ak) I−1(β, Ak) .

• Goggins and Finkelstein (2000), Kim and Xue (2002),

Chen, Tong and Sun (2007),

Tong, Chen and Sun (2008).

• Bogaerts, Leroy, Lesaffre and Declerck (2002),

He and Lawless (2003).



&

$

%

— An efficient estimation procedure

• Assume that K = 2 and the joint survival function of (T1, T2)is specified by a copula model as

S(s, t) = Cα(S1(s), S2(t)) .

• Also assume that the marginal survival functions S1 and S2

follow the PH model with

Sk(t) = exp(−Λ0 k (t) exp(β ′X)), k = 1, 2.

• Let l(β, α,Λ01,Λ02) denote the log likelihood function. Forestimation of θ = (β, α), one can derive the efficient score

function l∗θ for θ and solve l∗θ(θ, Λ01, Λ02) = 0.

• Wang, Sun and Tong (2008) investigated this for bivariate case Iinterval-censored and showed that the resulting estimates areconsistent and efficient.



&

$

%

II. Analysis of Doubly Censored Data

II.1. An Example — AIDS Cohort Study

• Subjects: 257 individuals with hemophilia who were treated by

given HIV contaminated blood from 1978 to August 1988

• Groups: heavily treated group and lightly treated group

if received at least 1000 µg/kg of the blood for at least

less between 1982 and 1985

• Variable of interest: AIDS latency time, from HIV infection to

AIDS diagnosis

• Interval-censored HIV infection times, right-censored AIDS

diagnosis time

• De Gruttola and Lagakos (1989), Kim, De Gruttola and

Lagakos (1993)



&

$

%

Table 2: Observed intervals in 6-month scale given by (L,R] for HIV infectiontime and observations (denoted by T with starred numbers being right-censoredtimes) for AIDS diagnosis time for some of 188 HIV-infected patients (the numbersin parentheses are multiplicities)

L R T L R T L R T L R T

Lightly treated group0 5 23∗ (2) 0 11 23∗ (2) 0 12 23∗ (3) 0 14 23∗

0 15 23∗ (9) 0 16 23∗ (4) 0 17 23∗ 0 18 23∗

2 10 23∗ 5 8 23∗ 6 10 23∗ 6 12 23∗

7 12 23∗ 7 13 23∗ 7 15 23∗ 8 13 23∗

8 14 23∗ (3) 9 12 23∗ (2) 9 16 23∗ 10 14 23∗ (4)11 13 23∗ (4) 11 14 23∗ 12 14 23∗ (4) 12 15 23∗ (3)13 15 23∗ (4) 14 16 23∗ (5) 0 3 8 0 12 155 12 16 9 11 20 9 12 21 10 12 2012 13 22 12 15 22 0 13 23∗ 6 13 173 11 23∗ 4 11 23∗ 5 13 23∗ 7 16 23∗

8 12 23∗ 9 15 23∗ 11 13 23



&

$

%

II.2. Nonparametric Estimation of Survival Functions

• Consider a study that involves n independent subjects who

experience two related events denoted by Xi and Si with

Xi ≤ Si and both being discrete, i = 1, ..., n.

• Suppose that the survival time of interest is Ti = Si − Xi

and Xi and Ti are independent.

• Let u1 < ... < ur denote the possible mass points for

the Xi’s and v1 < ... < vs the possible mass points

for the Ti’s.

• Define

w = {wj = Pr(Xi = uj) }, f = { fk = Pr(Ti = vk) },

j = 1, ..., r, k = 1, ..., s.



&

$

%

• Suppose that the observed data have the form

{ (Li, Ri] , (Ui, Vi] , i = 1, ..., n }

with Xi ∈ (Li, Ri] and Si ∈ (Ui, Vi].

• Then the full likelihood function is proportional to

LF (w, f) =n∏

i=1

r∑

j=1

s∑

k=1

αijk wj fk ,

where αijk = I(Li < uj ≤ Ri , Ui < uj + vk ≤ Vi).

• To estimate w and f , three self-consistency algorithms —

De Gruttola and Lagakos (1989): MLE

Gomez and Lagakos (1994): a two-step approach

Sun (1997): a conditional approach


'

&

$

%Time by Six Months

Surviv

al Func

tion

0 5 10 15

0.40.5

0.60.7

0.80.9

1.0

ML estimate for LT groupTS estimate for LT group CL estimate for LT groupML estimate for HT groupTS estimate for HT groupCL estimate for HT group


&

$

%

II.3. Nonparametric Comparison of Survival Functions

• Let the Xi’s, Si’s, and Ti’s be defined as before and suppose that

only doubly censored data are available and given by

{ (Li, Ri] , (Ui, Vi] , i = 1, ..., n } ,

where Xi ∈ (Li, Ri] and Si ∈ (Ui, Vi]. Also suppose that the

subjects come from p + 1 different groups with the survival

functions S1(t), ..., Sp+1(t) and the goal is to test

H0 : S1(t) = ... = Sp+1(t) .

• Again as before, let u1 < ... < ur denote the possible mass

points for the Xi’s and v1 < ... < vs the possible mass

for the Ti’s. Define

w = {wj = Pr(Xi = uj) }, f = { fk = Pr(Ti = vk) },

j = 1, ..., r, k = 1, ..., s.



&

$

%

— A generalized log-rank test

• To test H0, let S and F denote the MLE of the common

survival function of the Ti’s and the common cdf of the Xi’s

under H0, respectively. Define

dj =∑n

i=1 P (Ti = νj | (Li, Ri], (Ui, Vi], S, F ) ,

nj =∑n

i=1 P (Ti =≥ νj | (Li, Ri], (Ui, Vi], S, F ) ,

and djl and njl as dj and nj except the summation being

over the subjects in the lth group, j = 1, ..., s, l = 1, ..., p + 1.

• Then one can test H0 using the statistic U = (U1, ..., Up+1)′ with

Ul =s−1∑

j=1

(

djl −njl dj

nj

)

.

• Sun (2001).



&

$

%

II.4. Regression Analysis

• Let the Xi’s, Si’s, Ti’s, wj’s, fk’s, and αijk be defined as before

with doubly censored data

{ (Li, Ri] , (Ui, Vi] , Zi , i = 1, ..., n } .

• Assume that covariates have no effects on the Xi’s and thatXi is independent of Ti. For estimation of covariate effectson the Ti’s, consider

Sk(Zi) = Pr(Ti > vk|Zi) = (q1 · · · qk)exp(Z′

iβ) ,

the discrete PH model, or

λi(t) = λ0(t) exp(Z ′

i β ) ,

the continuous PH model, or the continuous AH model

λ(t) = λ0(t) + Z ′

i β .



&

$

%

• For the discrete PH model:

Kim, De Gruttola and Lagakos (1993) — MLE

Pan (2001) — Multiple imputation

• For the continuous PH model:

Sun, Liao and Pagano (1999) — Estimating equation

• For the continuous AH model:

Sun, Kim and Sun (2004) — Estimating equation

• If Xi and Ti = Si − Xi are not independent,

Frydman (1995) — Three state model



&

$

%

III. Other Topics and Future Research

III.1. Analysis with Informative Interval Censoring

• Consider a survival study yielding case II interval-censored data

{Ui, Vi, δi1 = I(Ti < Ui), δi2 = I(Ui ≤ Ti < Vi), δi3 = 1−δi1−δi2, Zi(·)}

for the survival times of interest Ti’s from n independent

subjects, where Ui < Vi are observation times.

• For Ti along with Ui and Vi, assume that

λTi (t |Zi(s), bi(s), s ≤ t) = λ0(t) + β ′

0 Zi(t) + bi(t) ,

λUi (t |Zi(s), bi(s), s ≤ t) = λ1(t) eγ′

0Zi(t)+bi(t) ,

λVi (t |Ui = ui, Zi(s), bi(s), s ≤ t) =

λ2(t)eγ′

0Zi(t)+bi(t) if t ≥ ui

0 if t < ui

given the latent process bi(t) with mean zero.



&

$

%

• To estimate regression parameters, define

N(1)i (t) = (1 − δ1i) I(Ui ≤ t)

and

N(2)i (t) = δ3i I(Ui ≤ t) I(Vi ≤ t)

conditional on Ui, i = 1, ..., n. Also define

N(1)i (t) = I(Ui ≤ t), N

(2)i (t) = I(Ui ≤ t) I(Vi ≤ t)

given Ui.

• For these counting processes, the intensity functions are

I(Ui ≥ t)Eb{e−

∫

t

0bi(s)ds ebi(t)} e−Λ0(t) λ1(t) e−β′

0Z∗

i(t)+γ′

0Zi(t),

I(ui < t ≤ Vi)Eb{e−

∫

t

0bi(s)ds ebi(t)} e−Λ0(t) λ2(t) e−β′

0Z∗

i(t)+γ′

0Zi(t),

I(Ui ≥ t)Eb{bi(t)}λ1(t) eγ′

0Zi(t),

I(ui < t ≤ Vi)Eb{bi(t)}λ2(t) eγ′

0Zi(t).



&

$

%

• It follows that one can develop an estimating function as

Uβ(β, γ) =

n∑

i=1

(1−δ1i){

Z∗

i (Ui)−S

(1)1,β(Ui, β, γ)

S(0)1,β(Ui, β, γ)

}

+n∑

i=1

δ3i

{

Z∗

i (Vi)−S

(1)2,β(Vi, β, γ)

S(0)2,β(Vi, β, γ)

}

for estimation of β0 given γ0, where

S(j)1,β(t, β, γ) = n−1

n∑

i=1

I(t ≤ Ui) e−β′ Z∗

i(t)+γ′Zi(t) Z

∗ (j)i (t) ,

S(j)2,β(t, β, γ) = n−1

n∑

i=1

I(Ui < t ≤ Vi) e−β′ Z∗

i(t)+γ′Zi(t) Z

∗ (j)i (t)

for j = 0, 1 with Z∗ (0)i (t) = 0 and Z

∗ (1)i (t) =

∫ t0 Zi(s)ds.



&

$

%

• Let γ denote a consistent estimate of γ0. Then one can

estimate β0 by the solution β to Uβ(β, γ) = 0.

• Wang, Sun and Tong (2008) showed that under some regularity

conditions, β is consistent and the distribution of

n1/2 ( β − β0 )

can be asymptotically approximated by the normal distribution

with mean zero and covariance matrix Σ that can be

consistently estimated by

Σ = Aβ(β, γ)−1 Γ(β, γ) [Aβ(β, γ)−1]′ ,

where Aβ(β, γ) = −n−1 ∂Uβ(β, γ)/∂β.



&

$

%

III.2. Bayesian Analysis of Interval-censored Data

• Parametric Bayesian approaches —

Banerjee and Carlin (2004)

• Nonparametric Bayesian approaches —

Dirichlet prior: Doss (1994), Calle and Gomez (2001),

Zhou (2004)

Discrete beta prior: Sinha (1997)

• Semiparametric Bayesian approaches —

Komarek and Lesaffre (2007, 2008), : AFT models for

multivariate interval-censored and doubly censored data,

respectively.

• Gomez, Calle and Oller (2004)



&

$

%

III.3. Some Future Research Directions

• Analysis with informative censoring: Assume that the

variables L and R in L < T ≤ R are not independent of T , or

P (L < T ≤ R,L = l, R = r) = P (l < T ≤ r|L = l, R = r) dG(l, r)

cannot be replaced by P (l < T ≤ r)

— Joint modeling approach: Finkelstein, Goggins and

Schoenfeld (2002)

— Conditional modeling approach: Zhang, Sun and Sun (2005),

Zhang, Sun, Sun and Finkelstein (2007)



&

$

%

• Model checking and regression diagnostics:

— case I interval-censored data: Ghost (2003)

— case II interval-censored data: Farrington (2000),

Sun, Sun and Zhu (2007)

— Bivariate interval-censored data: Wang, Sun and Sun (2006)

• Asymptotic properties of MLE:

— for bivariate interval-censored data and

— for doubly censored data



&

$

%

Thank you !


Methods for Interval-Censored Failure Time Data and Beyond II · Analysis of Interval-censored Data and Beyond II ’ & $ % II. Analysis of Doubly Censored Data II.1. An Example |

Documents