Top Banner
ST495: Survival Analysis: Maximum likelihood Eric B. Laber Department of Statistics, North Carolina State University February 11, 2014
35

ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Jul 11, 2018

Download

Documents

hadiep
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

ST495: Survival Analysis:Maximum likelihood

Eric B. Laber

Department of Statistics, North Carolina State University

February 11, 2014

Page 2: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Everything is deception: seeking theminimum of illusion, keeping within theordinary limitations, seeking the maximum.In the first case one cheats the Good, bytrying to make it too easy for oneself to getit, and the Evil by imposing all toounfavorable conditions of warfare on it. Inthe second case one cheats the Good bykeeping as aloof from it as possible, and theEvil by hoping to make it powerless throughintensifying it to the utmost. —Franz Kafka

Page 3: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Last time

I Introduced parametric models commonly used in survivalanalysis; discussed their densities, hazards, survivor function,and CDFs; showed how to draw from these distributions usingR

I Exponential

I Weibull

I Gamma

I Extreme value

I Log-normal

I Log-logistic

I We also discussed location-scale models and how to use thelocation-scale framework to incorporate covariate information

I We also discussed flexible models for the hazard functionincluding piecewise constant and basis expansions

Page 4: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Last time: burning questions

I How to choose a distribution?

I How to estimate the parameters indexing a chosendistribution?

I How can we accommodate difference types of censoring?

I How can I use R to do the foregoing estimation steps?

Page 5: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Warm-up

I Explain to your stat buddy

1. Hazard function

2. How a bathtub hazard might arise

3. How an increasing hazard might arise

4. How a decreasing hazard might arise

I True or false:I (T/F) Minimum of independent Weibull r.v.’s are Weibull

I (T/F) Measles increases fecundity in goats

I (T/F) An exponential distribution would be good model formortality in humans

I Who is generally credited with discovering maximumlikelihood estimation?

Page 6: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Warm-up cont’d

I Some concepts and notation for todayI Let a1, . . . , an be a sequence of constants then

n∏i=1

ai = a1 × a2 × · · · × an

I Let f (x) denote a function from Rp into R then

x∗ = arg maxx

f (x)

satisfies f (x∗) ≥ f (x) for all x ∈ Rp.

I We use 1Statement to denote the function that equals one ifStatement is true and zero otherwise. Thus, 1t≤1 equals oneif t ≤ 1 and zero otherwise.

Page 7: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Observation schemes

I Why is survival analysis its own sub-field of statistics?I Abundance of important applications

I Fundamental contributions to statistical theory (esp. insemi-parametrics)

I Dealing partial information due to censoring

I Recall that when we only observe partial information about afailure time we say that it’s censored

I T ≥ C (Right censored)

I T ≤ L (Left censored)

I V ≤ T < U (Interval censored)

Page 8: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Likelihood

I If generative model is indexed by parameter θ then thelikelihood is

L(θ) ∝ P(Data; θ),

which is viewed as a function of θ with the data being fixed

I The maximum likelihood estimator is

θ̂n = arg maxθ

L(θ)

I Warm-up: Let X1, . . . ,Xn ∼ N(µ, σ2), then θ = (µ, σ2),derive L(θ), and θ̂n. Check your answer with your stat buddy.

Page 9: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Likelihood cont’d

I Ex. Let T1, . . . ,Tn be an iid draw from distn with densityf (t; θ) indexed by θ then

L(θ) ∝ P(Data; θ) =n∏

i=1

P(Ti = ti ; θ) =n∏

i=1

f (ti ; θ),

I Let T denote a generic observation distd according to f (t; θ).How can we use L(θ) to estimate:

I The mean of T ?

I The CDF of T ?

I The hazard of T ?

Page 10: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Nonparametric maximum likelihood

I Ex. Let T1, . . . ,Tn be an iid draw from distn with densityf (t). Suppose now, however, we don’t put any restrictions off (t) (other than it being a density). In this case f is our‘parameter’

L(f ) ∝ P(Data) =n∏

i=1

f (ti ),

how can we maximize this over densities f ?I Claim: Our estimated f , say f̂ should only put positive mass

on t1, . . . , tn. (Why?)

I If f̂ puts mass on t1, . . . , tn then maximizing the likelihood isequivalent to solving

maxα1,...,αn≥0

n∏i=1

αi subj. ton∑

i=1

αi = 1

I Some painful calculus shows f̂ is pmf with f (ti ) = 1/n,i = 1, . . . , n

Page 11: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Nonparametric maximum likelihood cont’d

I Thus f̂ (t) is a discrete distribution with f (ti ) = 1/n. Theestimated is given by

F̂ (t) =1

n

n∑i=1

1t≤ti ,

this is called the empirical distribution function (ECDF)

Page 12: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Computing ECDF in R

n = 50;

x = rnorm (n);

FHat = ecdf (x);

plot (FHat, xlab=’x’);

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

x

Fn(

x)

●●

●●●●

●●●●●

●●●●●●●●●

●●●●●●●

●●

●●●●

●●

●●●●●●

●●●●●●

●●

Page 13: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Observation schemes

Page 14: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Truncated estimation

I Suppose t1, . . . , tn compose a random sample from subjectswith lifetimes less than or equal to one year. The LH takesthe form:

n∏i=1

f (ti |Ti ≤ 1) =n∏

i=1

{f (ti )

F (1)

},

Why?

I Suppose we posited a parametic model f (t; θ) for T , how canwe estimate θ using left-truncated data?

I Deceptively difficult stat question: Suppose T1, . . . ,Tn aredrawn ind. from an exp(θ) distn but are truncated at one.When does the maximum likelihood estimator exist?

Page 15: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Right-censoring

I Recall that a censoring time is right censored at C if we onlyobserve that T > C

I Our goal in the next few slides is to derive the LH underdifferent right-censoring mechanisms

I Notation: observe {(Ti , δi )}ni=1 where Ti is the observationtime and δi is the censoring indicator

δi =

{1 Failure time observed0 Right censored

I Big result of the day: Under a variety of right-censoringmechanisms:

LH ∝n∏

i=1

f (ti )δi S(ti+)1−δi

Page 16: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Type I censoring

I In type I censoring each individual has a fixed (non-random)censoring time C > 0

I If T ≤ C then failure time observed

I If T > C then right-censored

I Ex. Odense Malignant Melanoma Data: n = 205 subjectsenrolled between 1962 and 1972 at Odense Dept. of PlasticSurgery had tumors and surrounding tissue removed. Patientswere followed until the death or the study concluded in 1977.Note* This data is contained in the boot package in R.

Page 17: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Type I censoring cont’d

I Using the book’s notation: define ti = min(Ti ,Ci ) andδi = 1Ti≤Ci

then:I If δi = 1, ti is the failure time so the information (ti , δi )

‘contributes’ to the LH is f (ti )

I If δi = 1 then ti is the censoring time so the information (ti , δi )‘contributes’ to the LH is S(ti+)

I Thus, the LH isn∏

i=1

f (ti )δi S(ti+)1−δi

Page 18: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Type I censoring cont’d

I In class: Suppose T1, . . . ,Tn are iid exp(θ) but subject toType I censoring, let δ1, . . . , δn denote the censoringindicators. Derive the MLE for θ.

Page 19: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Independent random censoring

I Assume lifetime T and censoring time C are randomvariables.

I Often more realistic

I Ex. Random study enrollment times

I Ex. Subjects moving out of town

I . . .

I Let G (t) and g(t) denote the survivor and density functionfor C resp., define ti = min(Ti ,Ci ), δi = 1Ti≤Ci

, then

f (ti , δi ) = [f (ti )G (ti+)]δi [S(ti+)g(ti )]1−δi ,

Why?

Page 20: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Independent random censoring cont’d

I The LH for n iid observations is(n∏

i=1

f (ti )δi S(ti+)1−δi

)(n∏

i=1

g(ti )1−δG (ti+)δi

),

note that if g(t) and G (t) does contain information aboutf (t) or S(t) then the LH is proportional to

n∏i=1

f (ti )δi S(ti+)1−δi

It’s the same LH as before!!!

Page 21: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Type II censoring

I Observe individuals until the rth failure is observed, so that weobserve the r smallest lifetimes t(1) ≤ · · · ≤ t(r).

I All n units start at the same time

I Follow-up stops at the time of the rth failure

I Follow-up time is random

I Using properties of order statistics, the LH is

n!

(n − r)!

(r∏

i=1

f (t(i))

)S(t(r)+)n−r ∝

n∏i=1

f (ti )δi S(ti+)1−δi

Page 22: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Code break

Go over mle.R in R Studio

Page 23: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

For those who are adventurous

Page 24: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Counting process notation

I Goal: Show the form the LH for right-censoring applies in verygeneral settings

I For clarity we assume discrete time t = 0, 1, . . .

I Let hi (t) and Si (t) denote the hazard and survivor functionfor ith subject resp. ; further define

Yi (t) , 1Ti≥t, ith subj not censored

=

{1 ith subj. hasn′t failed or been censored at t0 Otherwise

,

if Yi (t) = 1 then we say the ith subj. is at risk at time t

Page 25: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Counting process notation cont’d

I Define

dNi (t) , Yi (t)1Ti=t

=

{1 if at risk and fails at t0 Otherwise

dCi (t) , Yi (t)1ith subj. censored at t

=

{1 if at risk and censored at t0 Otherwise

I Claim: {dNi (t), dCi (t), t ≥ 0} has a single 1 and the restzeros

Page 26: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Counting process notation cont’d

I Even more definitions:

dN(t) , (dN1(t), . . . , dNn(t))

dC(t) , (dC1(t), . . . , dCn(t))

H(t) , {(dN(s), dC(s), s = 0, 1, . . . , t − 1}

we say H(t) is the history of the survival process up to time t

Page 27: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Counting process notation: the likelihood

I Note that limt→∞H(t) contains all the information in thecollected data (why?), thus

P (Data) = P (dN(0)) P (dC(0)|dN(0))

×P (dN(1)|H(1)) P (dC(1)|dN(1),H(1))× · · ·

=∞∏t=0

P(dN(t)|H(t))P(dC(t)|dN(t),H(t))

Page 28: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Counting process notation: the likelihood cont’d

I To make the horrible expression tractable we’ll assumeconditional independence across subjects given H(t) and

P (dNi (t) = 1|H(t)) = Yi (t)hi (t),

explain this expression to your stat buddy

I We will also assume that terms inside P(dC(t)|dN(t),H(t))are not informative for the parameters in hi (t)

I When the above assumptions hold we say the censoring isnon-informative

Page 29: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Counting process notation: the likelihood cont’d

I Under the foregoing assumption the LH is given by

n∏i=1

∞∏t=0

hi (t)dNi (t)(1− hi (t))Yi (t)(1−dNi (t))

I To see this, we’ll consider two cases:I Case 1: ith subject’s failure time is observed at ti , then they’re

at risk at t = 0, 1, . . . , ti , and dNi (t) = 1t=ti , thus

∞∏t=0

hi (t)dNi (t)(1−hi (t))Yi (t)(1−dNi (t)) = hi (ti )

ti−1∏s=0

(1−hi (s)) = fi (ti )

I Case 2: ith subject is censored at time ti , then they’re at riskat times t = 0, 1, . . . , ti , and dCi (t) = 1t=si , thus

∞∏t=0

hi (t)dNi (t)(1−hi (t))Yi (t)(1−dNi (t)) =

ti∏t=0

(1−hi (t)) = Si (ti+1)

Page 30: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

Counting process notation: the likelihood cont’d

I Putting it all together shows the LH is proportional to

n∏i=1

fi (ti )δi Si (ti + 1)1−δi

I Limiting arguments show the LH is the same (with Si (ti + 1)replaced by Si (ti+)) in the continuous case

I Note* we’ll see later that using the framework of partiallikelihood that maximizing the above LH is appropriate ineven more general settings

Page 31: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

LH-based inference

I For parametric models the LH provides an efficient frameworkfor estimation and inference

I Let θ ∈ Θ ⊆ Rp index the survival distribution of interest,define

I L(θ) the LH

I `(θ) = log L(θ) the log-LH

I u(θ) = ddθ `(θ) the score function

I I (θ) = − d2

dθdθᵀ l(θ) the Fisher information

Page 32: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

LH-based inference cont’d

I Recall maximum LH estimator

θ̂n = arg maxθ∈ΘL(θ),

solves u(θ) = 0

I Under mild regularity conditions

√n(θ̂n − θ∗) N(0, I−1(θ∗))

Page 33: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

LH-based inference example

I Assume T1, . . . ,Tn are iid exp(λ) and subject tononinformative censoring. Let δ1, . . . , δn denote the censoringindicator. Find the MLE for λ, the Fisher information matrix,and a 95% confidence interval for λ.

I The LH, L(λ) is

n∏i=1

f (ti ;λ)δi S(ti ;λ)1−δi =n∏

i=1

λδi exp {−λtiδi} exp {−λti (1− δi )}

= λ∑n

i=1 δi exp

{−λ

n∑i=1

ti

}

I The log-LH, `(λ), is(n∑

i=1

δi

)log λ− λ

n∑i=1

ti

Page 34: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

LH-based inference example cont’d

I The score function u(λ) is given by

u(λ) =

∑ni=1 δiλ

−n∑

i=1

ti

setting this to zero and solving yields

λ̂n =

∑ni=1 δi∑ni=1 ti

Page 35: ST495: Survival Analysis: Maximum likelihood - …laber/L3_495.pdf · ST495: Survival Analysis: Maximum likelihood ... R I Exponential I Weibull I Gamma ... IWhy is survival analysis

LH-based inference example cont’d

I We take the negative derivative of u(λ) to getI (λ) =

∑ni=1 δi/λ

2

I How do we get a 95% confidence interval for λ?