Lecture 2: Key Functions and Parametric Distributions Survival Function Hazard Function Median Survival Common Parametric Distributions.

Lecture 2: Key Functions and Parametric Distributions

Survival FunctionHazard FunctionMedian SurvivalCommon Parametric Distributions

But First• Let’s think a little more about censoring and truncation

using an example…

• An investigator is interested in determining if treatment with amoxetine leads to recovery of cognitive function in rats with brain lesions that mimic Parkinson’s disease.

• The outcome of interest is time to complete recovery of cognitive function– i.e. the time it takes to return to baseline cognitive function

after treatment with amoxetine.

Amoxetine and Cognitive Function• Collect baseline measure of cognitive function

– Time to correctly perform water radial arm maze (WARM) task

• Induce cognitive impairment– Treat 4 week old rats with N-(2-chloroethyl)-N-ethyl-bromo-

benzylamine (DSP-4)– causes noradrenergic lesions in the locus coeruleus.

• Treat lesioned animals with Amoxetine– daily dose for 4 weeks (ages 4 to 8 weeks)– 0, 0.3, 1.0, or 3.0 mg/kg

• Measures cognitive performance post treatment– weekly for 16 weeks (ages 8 to 24 weeks)– Endpoint: time it takes to 100% cognitive function

Describe the type of censoring

• Rat does not achieve complete cognitive recovery at 12 weeks but does by 13 weeks.

• Rat that dies at 82 days but has not yet achieve complete cognitive recovery

• Rat survives to 24 weeks but never achieves complete cognitive recovery

Describe the type of censoring

• Rat doesn’t develop brain lesions due to misplaced DSP-4 treatment and shows complete cognitive recovery at 8 weeks

• Rat shows complete cognitive recovery 8 at weeks

Time to Event Outcomes

• Modeled using “survival analysis”• Define X = time to event

– X is a random variable– Realizations of X are denoted x– X > 0

• Key characterizing functions– Survival functions– Hazard rate (or function)– Probability density function– Mean residual life

PDF, survival function, hazard rate, and mean residual life

• f(x)

• S(x)

PDF, survival function, hazard rate, and mean residual life

• h(x)

• mrl(x)

Survival Function

• S(x) = the probability of an individual surviving to time x

• Basic properties– Monotonic non-increasing– S(0) = 1– S(∞) = 0*

*debatable: cure-rate distribution allow plateau at some other value

Types of time to event data• Continuous t

– Observe actual time

• Discrete t– Interval censoring– Grouping into intervals

Where p(xj) is the probability mass function, P(X = xj)

x

S x p X x f t dt

j

jx x

S x p X x p x

Example of Discrete Time to Event

• Discrete Uniform (3 times possible)

13

0

1

2

3

jpmf P x P x j

S

S

S

S

Hazard Rate• A little harder to conceptualize• Instantaneous failure rate or conditional

failure rate

• Interpretation: probability that a person at time t experiences the event in the interval (x, x+Dx) given survival to time x.

0

limx

P x X x x X xh x

x

Hazard Rate

• Only constraint:• Relationship between h(x), S(x) and pdf

(continuous):

0

limx

P x X x x X xh x

x

Hazard Function

• Useful for conceptualizing how the chance of an event changes over time

• i.e. consider hazard ‘relative’ over time• Examples:

– Treatment related mortality• Early on, high risk of death• Later on, risk of death decreases

– Aging• Early on, low risk of death• Later on, high risk of death

Shapes of Hazard Functions

• Increasing– Natural aging and wear

• Decreasing– Early failures due to device or transplant failures

• Bathtub– Populations followed from birth

• Hump Shaped– Initial risk of event, followed by decreasing chance

of event

Examples

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

Time

Ha

zard

Fu

nct

ion

R Code for Hazard Function Shapes#Examples of hazard function shapesweibull.hazard<-function(x,alp,lam) { h<-alp*lam*x^(alp-1) return(h)

}loglogistic.hazard<-function(x,alp,lam) {

h<-alp*lam*x^(alp-1)/(1+lam*x^alp) return(h)}x<-seq(0, 6, 0.05)h1<-weibull.hazard(x, 1.5, 0.25)plot(x, h1, type="l", lwd=2, ylab="Hazard Function", xlab="Time", ylim=c(0,1))h2<-loglogistic.hazard(x, 0.5, 0.25)lines(x, h2, lwd=2, col=2)h3<-loglogistic.hazard(x, 2, 1)lines(x, h3, lwd=2, col=3)h4<-0.01*(x-3)^4lines(x, h4, lwd=2, col=4)

Cumulative Hazard Function

• Often used instead of the hazard function

– Relationship between H(x) and S(x)

• More on this later or model checking…

0

xH x h u du

What if T is discrete?

• So far we’ve focused on T as a continuous r.v.• Discrete x

– Interval censoring– Grouping into intervals

• Depending on level of discreteness, use discrete data approach

where p(xj) is a pmf (P(X = xj)).

j

jx x

S x P X x p x

Complications• How can we use this to define our “discrete”

hazard function?

1

1

1 1

1

,

1

1

ln 1

Consider:

Note:

Implying:

And: but

So redefine as: so

j j

j

j

j j

j j j

j

j j j

j j j j

j j

j

jx X x Xj

H tjx X

jx X

P X x X xh x P X x X x

P X x

S x S x S xp x S x S x h x

S x S x

S xS x h x

S x

H x h x S x e

H x h x S x e

holdsH t

Mean Residual Life

• Biomedical applications– Median is very common– MRL is not common

• MRL = the expected residual life

• Theoretically, could be useful to predict survival times given survival to a certain point in time.

x xt x f t dt S t dt

mrl x E X x X xS x S x

Mean

• We do not see the mean quantified very often in biomedical applications

• Why?– Recall our censoring issue– Empirical means depend on parametric model– Means can only be ‘model-based’– Somewhat counterintuitive, especially when alternatives exist

• More common: median

0 0

E x tf t dt S t dt

Median

• Very/Most common way to express the ‘center’ of the distribution

• Rarely see another quantile expressed• Find t such that

• Complication: in some applications, median is not reached empirically

• Reported median based on model seems like an extrapolation• Often just state ‘median not reached’ and given alternative

point estimates

0.5S x

X-Year Survival Rate

• Many applications have ‘landmark’ times that historically used to quantify survival

• Examples:– Breast cancer: 5 year relapse-free survival– Pancreatic cancer: 6 month survival– Acute myeloid leukemia (AML): 12 month relapse-

free survival• Solve for S(x) given x

Common Parametric Distributions

• Course will focus on non-parametric and semi-parametric methods

• But… some parametrics can be useful• Especially for trial design• Note that power and precision are improved

under parametric approaches versus others

Example 1: Exponential

• Recall the exponential distribution– f(t) = – F(t) =

• What is S(t) based on F(t) and f(t)– S(t) =

• l represents the failure rate per unit of time– Large l, rapid decay– Small l, slow decay

Example 1: Exponential

0 10 20 30 40 50 60

0.0

0.2

0.4

0.6

0.8

1.0

Time

Su

rviv

al F

un

ctio

n

= 0.1 = 0.05 = 0.01

R Code for the Plottime<-seq(0, 60, 0.1)S1<-exp(-0.1*time)S2<-exp(-0.05*time)S3<-exp(-0.01*time)plot(time, S1, xlab="Time", ylab="Survival Function", col=3 , lwd=2, type="l")lines(time, S2, col=2 , lwd=2)lines(time, S3, col=4 , lwd=2)labs<-c(expression(paste(lambda, " = ",0.1, sep="")),

expression(paste(lambda, " = ",0.05, sep="")), expression(paste(lambda, " = ",0.01, sep="")))

legend(x=45, y=.95, labs, col=c(3,2,4), lty=c(1,1,1), lwd=(2,2,2), cex=0.9)

Example: Kidney Infection after Catheterization

• Kidney infection after catheter insertion in patients using portable dialysis equipment

• Time to event was time to catheter removal BUT should be noted that catheter can be removed for reasons other than infection (right censored)

• Only 76 observations (!)• Time to infection is outcome of interest• Question: can we describe it using a parametric

approach?

Kidney Infection Example:Survival curve and 95% confidence intervals

Exponential• Overly used due to simplicity• One parameter• Recall: S(t) = e-lt

• Hazard function:

• Note: constant hazard (huge assumption)

Exponential

• Mean =

• Median =

Exponential• MRL =

• “lack of memory”

• Realistic?

P T t z T t P T z

Exponential• Recall the cumulative hazard function H(t)• For exponential:

• Plot of ln(H(t)) vs. ln(t) should be a straight line with:– Slope = ?– Intercept = ?

• Used for model checking with non-parametric distribution of H(t)

Does Exponential Fit the Kidney Data?

R Code### Kidney infection examplelibrary(survival)surv.kid<-Surv(kidney$time, kidney$status)fit.kid<-survfit(surv.kid~1)plot(fit.kid, xlab="Time", ylab="Survival Fraction")# summarize KM estimator to get median survivalsummary(fit.kid)# define log cumulative hazard and log timelogHt<-log(-log(fit.kid$surv))logt<-log(fit.kid$time)# Plot log cumulative hazard vs. log timeplot(logt, logHt, lwd=2, type="l", xlab="log(t)", ylab="log(H(t))")points(logt, logHt, pch=16)# add plot of x=y line. If exponential fits, should be parallel.# Note intercepts may be differentabline(-4.89, 1, lwd=2, col="red")

Exponential

• Another alternative model check• Note that H(t) = lt for exponential• Can simply plot –ln(S(t)) versus t• Should be a straight line with

– Slope = ?– Intercept = ?

• Why would the previous be preferred?• Because it can accommodate Weibull as we will

see….

Another Exponential Check

More Model Checking

• We will build likelihood later• For now, accept that the MLE of l is

• Where di indicates whether the event is observed or censored for patient i, an ti is the event or censoring time

• Here: • This implies a model such that S(t) = e-0.0075t

ˆ i

i

d

t

587724

ˆ 0.0075

Compare Fitted and Observed S(t)

What about specific survival time? Median survival? Mean survival?

• Empirical:– 200 day survival = 21.0%– Median survival = 66 days– Mean survival = ?

• Exponential Model:– 200 day survival = S(200) = ?– Median survival = ?– Mean survival = ?

Weibull

• Generalization of the Exponential• VERY common for survival, but not always

perfect• Shape and Scale parameters: a and l• Variable hazard

– Increasing– Decreasing– Constant (a = 1)

Weibull: Generalization of Exponential

• Shape Parameter: a• Scale Parameter: l

• Equivalent to the exponential when a = 1• Note: There are different parameterizations for

the Weibull

1

1

; ,

( )

a

t

t

f t t e

S t e

h t t

Weibull Example

0 10 20 30 40 50 60

0.0

0.2

0.4

0.6

0.8

1.0

Time

Su

rviv

al F

un

ctio

n

= 0.05, = 0.5 = 0.05, = 1 = 0.01, = 0.5 = 0.01, = 1

R Code for the Weibull Plot#Weibulltime<-seq(0,60, 0.1)S1<-exp(-0.05*time^.5)S2<-exp(-0.05*time^1)S3<-exp(-0.01*time^0.5)S4<-exp(-0.01*time^1)plot(time, S1, xlab="Time", ylab="Survival Function", col=2, lwd=2, type="l", ylim=c(0,1))lines(time, S2, col=1, lwd=2)lines(time, S3, col=3, lwd=2)lines(time, S4, col=4, lwd=2)labs<-c(expression(paste(lambda, " = ",0.05, ", ", alpha, " = ",0.5, sep="")), expression(paste(lambda, " = ",0.05, ", ", alpha, " = ",1, sep="")), expression(paste(lambda, " = ",0.01, ", ", alpha, " = ",0.5, sep="")), expression(paste(lambda, " = ",0.01, ", ", alpha, " = ",1, sep="")))legend(x=0, y=.25, labs, col=c(2,1,3,4), lty=1, lwd=2,cex=0.9)

Effect of Shape Parameter

Weibull

• Mean is ugly (gamma included, see pg. 38)• Median:

• Model checking:– Can do the same log(-log(S(t))) plot:

log(-log(S(t))) = log(l) + alog(t)

– Here, linearity required, but the slope not = 1• More later when we discuss likelihoods

1

lim

0.5

ln 2xt

Log-normal

• Just like it sounds• If Y ~ normal, then log(Y) ~ log-normal• Two parameters: m and s• Survival function

• Median

ln1

tS t

0.50 expt

Log-normal

• Log-normal can work well in medical applications (e.g. age of disease onset)

• Hazard is hump-shaped

• Critics think that decreasing hazard at later times is unrealistic

Log-logistic

• If Y follows a logistic, then log(Y) ~ log-logistic• Logistic is similar to normal, but the survival

function is easier to work with• Hazard similar to Weibull, but more variable in

shapes for hazard– Monotone decreasing– Hump-shaped

Log-logistic

• Survival Function:

• Hazard function:

• Median:

1

1S t

t

1

1

th t

t

1

0.50

1 a

t

Gamma

• Generalization of exponential

• Not easy to work with

1

( ) , , 0k tt e

f t kk

1

0

1

( ) 1

( )1

s k x

k

k

k t

k

x e dxI s

k

S t I k

t eh t

k I k

Cure Rate Distribution

• Not in K & M• Assumption: fraction of individuals never fail• Violates assumption that S(∞) = 0• Useful for clinical trials in which

– A fraction of the patients are cured– Event my never occur (e.g. cancer relapse)

Cure Rate Example

• 75% of women with early stage breast cancer are cured by treatment

• Remaining 25% of women relapse– Assume exponential– l = 0.05

Cure Rate Distribution

• Mixture model:

• S(t) =• p = • S*(t) =

*1i iS t p S t p

Cure Rate: Breast cancer example

R Code

par(mfrow=c(1,2))t<-seq(0,1000,0.1)St<-0.25*exp(-0.05*t)+0.75par(mfrow=c(1,2))plot(t, St, xlim=c(0,60), ylim=c(0,1), type="l", lwd=2, xlab="Time(months)", ylab="Survival Fraction")plot(t, St, xlim=c(0,1000), ylim=c(0,1), type="l", lwd=2,

xlab="Time(months)", ylab="Survival Fraction")

Competing Risks• Used to be somewhat ignored• Not so much anymore• Idea:

– Each subject can fail due to one of K causes (K > 1)– Occurrence of one event precludes us from

observing the other event– Usually, quantity of interest is the cause specific

hazard• Overall hazard equals sum of each hazard

1

K

T kkh t h t

Example

• An investigator is looking at graft rejection in kidney transplant patients

• However… patients can also experience graft failure and death

• Treat graft failure and graft rejection events as censored observations

• Why is this a problem?

Assumptions

• Dependence structure between the ‘potential’ failure times

• Identifiability dilemma: Can only observe one time per person so not testable

• We can not distinguish between independent and dependent competing risks

Useful Approaches

• Want to account for other causes– Adjust the denominator

• Compare rates of events– Use measures of probabilities

• Crude: probability of event k allowing for all other risks• Net: probability of event k if it is the ONLY risk• Partial: probability of event k is one of a subset of risks

acting in the population

• See K & M for more details

Lecture 2: Key Functions and Parametric Distributions Survival Function Hazard Function Median Survival Common Parametric Distributions.

Documents