ST745: Survival Analysis: Nonparametric methodseblaber/L5.pdfST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University

Post on 23-Apr-2021

10 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

ST745: Survival Analysis:Nonparametric methods

Eric B. Laber

Department of Statistics, North Carolina State University

February 5, 2015

The KM estimator is used ubiquitously inmedical studies to estimate and depict thefraction of patients living for a certainamount of time after treatment. It has sincebeen applied to data from clinical trials oftherapies for every disease from cancer tocardiology to concussion. —Science Life

Paul Meiers work and the KM analysis havebeen responsible for saving millions oflives.—Significance

Then and now

I Last time we discussed max-LH with censoringI Right-censoring schemes

I Left-truncation

I Interval censored data

I Current status data

I Estimating parametric models in R

I Large sample theory and inference

I Today we’ll discussI Kaplan-Meier estimator and inference

I Nelson–Aalen estimator and inference

I Using R for nonpar estimation

Warm-up

I Explain to your stat buddy

1. What’s the difference between left-censoring andleft-truncation?

2. Given two examples of nonparametric estimators

3. Pros and cons of nonparametric methods relative to parametricmethods

4. What is a confidence interval?

I True or false:I (T/F) Paul Meier is still alive

I (T/F) The bootstrap is an asymptotic approximation

I (T/F) The intergral symbol∫

was invented by GottfriedWilhelm Leibniz III

Things to recall

I For a discrete distribution with failure times t1, . . .

S(t) =∏j :tj<t

[1− h(tj)] ,

where h(tj) = P(T = tj |T ≥ t)

Family feud!

I I surveyed statisticians in SAS hall for the five most importantsteps in an applied statistical analysis. What are they?

1.2.3.4.5.

Complications due to censoring

I Consider making a simple visual display of lifetime data subjto right-censoring

I Why is this important?

I Consider making a histogram, what goes wrong?

I What about plotting the empirical CDF?

I Today we’ll see how to make these plots (and more!)

Product limit estimator: warm-up

I Let T1, . . . ,Tn denote an iid sample (no-censoring)I Empirical CDF

F (t) =1

n

n∑i=1

1Ti≤t

I Empirical survival function (ESF)

S(t) =1

n

n∑i=1

1Ti≥t

I Does F (t) = S(t) everywhere?

Ex. ECDF and ESF

0 5 10 15

0.0

0.2

0.4

0.6

0.8

1.0

x

F(x

)

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●● ●●

●●●●

● ●●●●

●●●●●●●

●●●● ● ●●

●●●●

●● ● ● ●●● ●●

0 5 10 150.

00.

20.

40.

60.

81.

0

x

S(x

)

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●● ● ●●●●●●●● ● ● ●● ● ●● ●

I How big are the steps above?

Ex. ECDF and ESF cont’d

n = 100;

x = rchisq (n, df=4);

par(list(mar=c(5,5,4,1) + 0.1, mfrow=c(1,2)));

plot (stepfun (sort(x), c(0, (1:n)/n)), xlab="x",

ylab=expression(hat(F)(x)), main="", lwd=3);

plot (stepfun (sort(x), c(1, 1-(1:n)/n)), xlab="x",

ylab=expression(hat(S)(x)), main="", lwd=3);

Ex. ECDF and ESF cont’d

I If t1 < t2 < · · · < tk are distinct failure times

S(t) =1

n

k∑j=1

dj1tj≥t ,

where dj are the number of observations equal to tjI Why?

ECDF and ESF under censoring

I When there is censoringI Number of points in an interval [a, b] is unknown

I Cannot compute ESF or ECDF

I Kaplan-Meier (KM) estimator (aka product limit estimator) isan analog of the ESF for right-censored data

I The original KM paper is the most highly cited statistics paperto date. What is the second most highly cited?

KM estimator

I Let {(t ′i , δi )}ni=1 denote obs. data with distinct failure timest1 < t2 < · · · < tk (these DO NOT include censoring times)

I DefineI dj ,

∑ni=1 1t′i =tj ,δi=1 to be the number of failures at tj

I nj ,∑n

i=1 1t′i ≥tj to be the number at risk at tj

I The KM estimator of S(t) is

S(t) =∏j :tj<t

(nj − dj

nj

)

I Explain S(t) intuitively to your stat buddy

Why does KM make sense?

I Given {(t ′i , δi )}ni=1 how can we estimate h(tj)? (Assumediscrete for now)

h(tj) = P(T = tj |T ≥ tj) ≈#fail at tj

#at risk at tj

=djnj

apply S(t) =∏

j :tj<t [1− h(tj)] ≈∏

j :tj<t

(1− dj

nj

)= S(t)

Ex. compute the KM estimator

t δ

6 14 15 011 01 115 12 0

tj nj dj (nj − dj)/nj S(tj+)

14615

Code break I: Computing KM in R

I See file firstKM.R

Sanity check

I Claim: The KM estimator reduces to the ESF when there isno censoring. Why?

I Answer on board.

Code break II: Example 3.2.1 from Lawless

I See file ex321.R

Variance estimation

I A consistent estimator of the variance of S(t) is given byGreenwood’s formula:

σ2S(t) = S2(t)∑j :tj<t

djnj(nj − dj)

I When there is no censoring, this reduces to S(t)(1− S(t))/n.Why is this the right quantity?

KM as nonparametric MLE

I Recall our counting process notation

Yt(t) = 1Ti≥t,ith subj not cens at t

dNi (t) = Yi (t)1Ti=t

dCi (t) = Yi (t)1ith subj cens at t,

we’ll assume a discrete distribution with potential failure timest = 0, 1, . . .

I With your stat buddy prove∑n

i=1 dNi (t) =∑n

i=1 Yi (t)dNi (t)

KM as nonparametric MLE cont’d

I Recall from our work on non-informative censoring that

L ∝n∏

i=1

∞∏t=0

h(t)dNi (t) [1− h(t)]Yi (t)dNi (t)

I Note* We saw this en route to simplifying to an expressioninvolving f (t) and S(t); for our purposes it will be convenientto use the above form.

KM as nonparametric MLE cont’d

I The LH simplifies to

L ∝∞∏t=0

h(t)dt [1− h(t)]nt−dt ,

where dt ,∑n

i=1 dNi (t), nt ,∑n

i=1 Yi (t)

I Why? Interchange products to obtain

∞∏t=0

n∏i=1

h(t)dt [1− h(t)]nt−dt

=∞∏t=0

h(t)∑n

i=1 dNi (t) [1− h(t)]∑n

i=1 Yi (t)(1−dNi (t)) ,

and use∑

i=1 Yi (t)dNi (t) =∑n

i=1 dNi (t) = dt

KM as nonparametric MLE cont’d

I To obtain nonparametric MLE we view (h(0), h(1), . . .) as ourparameter and maximize L

I If nt = 0 then there is no information about h(t), let τ denotethe largest t s.t. nt > 0 then

L ∝τ∏

t=0

h(t)dt [1− h(t)]nt−dt ,

and the log-LH is

` =τ∑

t=0

{dt log h(t) + (nt − dt) log (1− h(t))}

KM as nonparametric MLE cont’d

I Differentiate ` wrt to h(t) to obtain

∂h(t)` =

dth(t)

− (nt − dt)

1− h(t),

set this to zero and solve for h(t) to obtain h(t) = dt/ntI Then

S(t) =∏j :tj<t

[1− h(tj)

]=∏j :tj<t

[1−

djnj

],

is the MLE for S(t) by the invariance property of the MLE

KM as nonpar MLE, enough already!

I Some things to note

1. If the last obs time τ is a failure then S(t) ≡ 0 for all t > τ

2. If the last obs time τ is a censoring time then S(t) is notdefined for t > τ

3. MLE formulation is powerful since large sample theory can beused to study efficiency and conduct statistical inference

Fact from your past

I Let g be a smooth function from R into R then

g(θn) ≈ g(θ) +∇g(θ)(θn − θ)

so thatVar g(θn) ≈ ∇g2(θ)Var θn,

thus we can approximate the variance of θn via

Var θn ≈1

∇g2(θ)Var g(θn)

I Ex. Let g(u) = log u to obtain

Var S(t) ≈ S2(t)Var log S(t)

Computing Greenwood’s formula

I If we can approximate the variance of log S(t) then we canuse the preceding expansion to approximate Var S(t)

I Recall the score function (derivative of log-LH) is

u(h(t)) =dth(t)

− (nt − dt)

1− h(t),

so that

u′(h(t)) = − dth2(t)

− (nt − dt)

(1− h(t))2

= −nt[

1

h(t)+

1

1− h(t)

]= − nt

h(t)(1− h(t))

Computing Greenwood’s formula cont’d

I Observed fisher info is a diagonal matrix with entries

It =nt

h(t)(1− h(t))

I Thus (h(0), h(1), . . . , h(τ)) are asymptotically independent s.t.

Var log S(t) = Var log∏j :tj<t

[1− h(tj)

]

= Var

∑j :tj<t

log[1− h(tj)

]≈

∑j :tj<t

Var log[1− h(tj)

]

Computing Greenwood’s formula cont’d

I We can estimate Var log{

1− h(tj)]

using our approx

Var log{

1− h(tj)]≈

Var h(tj)

(1− h(t))2≈

I−1tj

(1− h(t))2=

ntj h(tj)

1− h(tj)

I Putting it all together

Var(S(t)) ≈ S2(t)∑j :tj<t

h(tj)ntj

1− h(tj)= S2(t)

∑j :tj<t

dj(nj − dj)2nj

,

where we have used ntj = nj and h(tj) = dj/nj

Computing Greenwood’s formula epilogue

I We glossed over some slippery technical details; for rigoroustreatment see advanced survival texts (e.g., Flemming andHarrington, 2005). For a treatment of infinite dimensionalparameter spaces see Butches semi-parametrics course.

Nelson-Aalen estimator

I One could obtain an estimator of the cumulative hazard via− log S(t) (why?) but the following estimator is typicallypreferred

H(t) ,∑j :tj≤t

djnj,

this is called the Nelson-Aalen (pronounced OH-len) estimator

Ex. compute the NA estimator

t δ

6 14 15 011 01 115 12 0

tj nj dj dj)/nj H(tj)

14615

Code break III: Computing NA in R

I See file firstNA.R

Plotting the NA estimator

I Plot of H(t) informative for the shape of the hazard fnI H(t) linear implies constant hazard

I H(t) convex implies monotone hazard

I Slope of H(t) approximates h(t)

Match the NA estimator with the true hazard

●●●●●●●●

●●●●●

● ●●●●

●●

●●●

●●●●●●

●●●

0 1 2 3 4

01

23

4

time

H(t)

● ● ● ● ●●●●

● ●●●●●●●●●

●●●●●

●●●●

●●

●●

1 2 3 4 5 6 70

12

34

time

H(t)

● ● ● ● ●●● ●●●●● ●●●

●●●●●

●●

●●

●●●

●●

●●

0.5 1.0 1.5 2.0

01

23

4

time

H(t)

0 1 2 3 4 5

0.6

0.8

1.0

1.2

1.4

time

h(t)

0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

time

h(t)

0 1 2 3 4 5

02

46

810

time

h(t)

Variance estimation

I NA estimator is an MLE just like KM

I Variance estimator for H(t) is

σ2H(t) =∑j :tj≤t

dj(nj − dj)

n2j,

which can be derived using large-sample approximations

Codebreak IV: NA on Example 3.2.1 from Lawless

I See file ex321NA.R

Confidence interval for S(t)

I Fact: For any fixed t > 0

S(t)− S(t)

σS(t) N(0, 1)

I Stronger convergence results (simultaneous over all t) exist

I (1− α)× 100% CI based on Greenwood’s formula

S(t)± z1−α/2σS(t)

Alternative confidence intervals

I Greenwood’s formula is intuitive but has drawbacksI CI generally does not perform well in small samples

I Can generate a CI with endpoints outside of (0, 1)

I Recall our general strategy for modeling probabilities

1. Transform to take values in R2. Conduct estimation/inference on transformed scale

3. Transform back to (0, 1)

Transformed confidence interval

I Let g(s) be a decreasing cts function from (0, 1) onto R,construct a CI for g(S(t)) then transform back via Taylorapprox

I Define ψ(t) , g(S(t)) then

σ2ψ(t) ≈[∇g

{S(t)

}]2σ2S(t)

I Taylor series arguments show

P

(−z1−α/2 ≤

ψ(t)− ψ(t)

σψ(t)≤ z1−α/2

)≈ 1− α

Transformed confidence interval cont’d

I Rearrange terms to obtain

P(ψ(t)− z1−α/2σψ(t) ≤ ψ(t) ≤ ψ(t) + z1−α/2σψ(t)

)≈ 1−α

I Solve for S(t) using ψ(t) = g(S(t))

P

(g−1

{ψ(t) + z1−α/2σψ(t)

}≤ S(t)

≤ g−1{ψ(t)− z1−α/2σψ(t)

})≈ 1− α

I Note the arguments within g−1 have flipped

I Question: How do we know g−1 exists and is decreasing?

Transformed confidence interval cont’d

I If g(s) = log (− log(s))I CI is [

e{− exp(ψ(t)+z1−α/2σψ)}, e{− exp(ψ(t)−z1−α/2σψ)}]

I Variance is

σ2ψ(t) =

σ2S(t)[

S(t) log S(t)]2

I Another common choice is g(s) = − log(s)

Bootstrap: AKA the boostarp

I Eric Draws a brilliant depiction of the bootstrap on the board

I Applaud subsides

I A quiet moment of reflection reveals a new appreciation forthe beauty of statistics in each of us

The boostarp cont’d

I Let D = {(Ti , δi )}ni=1 denote the observed data and Pn theempirical distribution

I A (nonparametric) bootstrap sample is a sample of size n, sayD(b), drawn uniformly (with replacement) from D

I D(b) is an i .i .d . draw of size n from Pn

I Other resample sizes are possible

I Standard percentile bootstrap CI for S(t)

1. Draw B nonparametric samples, D(1), . . . ,D(B)

2. Compute S (b)(t), KM on D(b), b = 1, . . . ,B

3. Let α/2, and u1−α/2 be the (α/2)× 100 and (1− α/2)× 100

percentiles of S (1)(t), . . . , S (B)

4. Final (1− α)× 100% CI is[α/2, u1−α/2

]

Simulated experiment: coverage probabilities

I T ∼ log-normal(−1, 2), C ∼ exp(1.75)

I Sample size of n = 200 and 10K MC replications

I Compare coverage of Greenwood’s formula with log − logtransform

●●

●●

●●

●●●

●●●● ●

● ●●

● ● ●

● ●

●●

● ●

●● ●

●● ●

●●

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

0.90

0.91

0.92

0.93

0.94

0.95

t

Cov

erag

e

+++++

+

++

++++

+++

++++++++++

+++++++

+++++ + + + + + + + + ++

++++

+

++

++++

+++

++++++++++

+++++++

+++++ + + + + + + + + +

● GreenwoodLog−log

See coverageExample.R

Confidence intervals for quantiles

I In some settings a quantile is of interestI E.g., the medianI Quantiles are often easier to estimate than moments

I Recall tp is the pth quantile of T

tp = inf {t : 1− S(t) ≥ p}

I Give an estimator S(t) of S(t) we obtain

tp = inf{t : 1− S(t) ≥ p

}

Confidence intervals for quantiles cont’d

I For continuous T , S(tp) = 1− p

I Suppose tL = tL(Data) satisfies

P (S(tL) ≥ 1− p) ≥ 1− α,

then tL is a lower confidence bound for tp (Why?)

I For any fixed t

P(S(t) ≥ S(t)− z1−α/2σS(t)

)≈ 1− α,

solve S(tL)− z1−α/2σS(t) = 1− p for tL

top related