Statistics for Survival Data Day 2 WBL 17–19 Alain Hauser [email protected]2018-08-27 Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 1 / 176 Part IV Regression Models Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 106 / 176 Learning objectives Explain the model assumptions behind parametric regression models Fit a regression model in R Indicate the fitted model from an R output, and interpret it Assess whether a fitted model is appropriate (model validation) Perform model or variable selection using forward or backward search Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 107 / 176 Section 1 Weibull regression Alain Hauser Survival Analysis / WBL 17–19 2018-08-27 108 / 176
18
Embed
StatisticsforSurvivalData Day2 Part IV WBL17–19 Regression ... · StatisticsforSurvivalData Day2 WBL17–19 AlainHauser [email protected] 2018-08-27 AlainHauser SurvivalAnalysis/WBL17–19
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Explain the model assumptions behind parametric regression modelsFit a regression model in RIndicate the fitted model from an R output, and interpret itAssess whether a fitted model is appropriate (model validation)Perform model or variable selection using forward or backward search
Repetition: Weibull model in logarithmic time scale
Recall Weibull model in logarithmic time scale:Let T be Weibull distributed, and set Y := logT .We have seen that Y belongs to a location-scale familyMore precisely, Y has probability density
fY (y) = αe(y−log λ)α exp(−e(y−log λ)α
)=
1σexp
(y − µσ− exp
(y − µσ
))
with σ := 1/α, µ := log λHence we can write Y = µ+ σZ , where Z has standard extremevalue distribution: fZ (z) = exp (z − ez)
Recall two-sample problem with heroin data set:We fitted a Weibull model two both clinics, using the same shape, butdifferent scale parametersOn the logarithmic time scale, this means we fitted individual µ’s, buta common σ:Clinic 1: Y1 = µ1 + σZClinic 2: Y2 = µ2 + σZ
Introduce a binary explanatory variable X , settings X = 0 foraddicts in clinic 1, and X = 1 for addicts in clinic 2 (indicator variablefor clinic 2).The model can be rewritten as Y = β0 + β1X + σZ , with β0 = µ1,β1 = µ2 − µ1.
Definition (Weibull regression model)Let T be an event time, and X1, . . . ,Xp explanatory variables. The generalWeibull regression model looks as follows:
Y := logT , Y = β0 + β1X1 + . . .+ βpXp + σZ ,
where Z has standard extreme value distribution: fZ (z) = exp (z − ez).
µ = β0 + β1X1 + . . .+ βpXp is called the linear predictorEstimation of the model parameters β0, . . . , βp and σ can be donewith maximum-likelihood.MLE can account for censored data; therefore survival data shouldalways be fitted with survreg, and not, e.g., glm
Weibull regression model is normally written in logarithmic time scale(as before)Translated back to the original time scale, the survivor function reads
With argument se.fit = TRUE, predict also returns the estimatedstandard error for the quantilesTo calculate confidence intervals for the quantiles, it’s better to getquantiles and standard errors on the logarithmic time scale; use type ="uquantile" then:uquant <- predict(addicts.weib.full, type = "uquantile",
Tukey-Anscombe plotPlot of residuals vs. fitted valuesWith survival data, we only know the residuals of non-censoredindividualsIt usually makes sense to plot residuals and fitted values in logarithmictime scalePlot should show residuals that have a similar distribution over allfitted values (no trend, no cone, etc.)
Q-Q plot of residuals:Plot of empirical quantiles of residuals vs. theoretical residualsexpected by error distribution (in our case, Weibull distribution, orextreme value, if on logarithmic time scale)Q-Q plot should show a straight line
Let’s include only clinic and prison as explanatory variables; so theKaplan-Meier estimator in all groups is still reliable:addicts.km.strat <- survfit(Surv(survt, status) ~ clinic + prison,
data = addicts)addicts.table <- summary(addicts.km.strat)plot(log(addicts.table$time), log(-log(addicts.table$surv)),
From the summary of the full Weibull model for the heroin data set, wesee that prison is not significant (on the 5% level)Can we remove the variable from the model?Compare the full and the reduced model with a likelihood ratio test:addicts.weib.red <- survreg(Surv(survt, status) ~ clinic + dose,
data = addicts, dist = "weibull")anova(addicts.weib.red, addicts.weib.full, test = "Chisq")
## Terms Resid. Df -2*LL Test Df Deviance Pr(>Chi)## 1 clinic + dose 234 2172.503 NA NA NA## 2 clinic + prison + dose 233 2168.953 = 1 3.549613 0.05955934
Alternative to likelihood ratio test: Akaike information criterion(AIC)AIC assigns a score to every model: AIC = 2k − 2 log(likelihood); k :number of parameters of the modelAIC “penalizes complexity”Model selection procedure: from a given set of candidate models,take the one that minimizes the AICHeroin example:AIC(addicts.weib.full)
When fitting a Weibull regression model with many explanatoryvariables, not all of them are significant in general (heroin example:prison)As with other regression techniques, we should perform model orvariable selection to get rid of non-significant variablesReasons:
I avoid overfittingI improve interpretability of modelI improve predictive power of model
Theoretically best approach for model selection: exhaustive search:1 fit every possible model2 keep the “best” one according to some criterion (e.g., the one that
minimizes the AIC, or the BIC, or similar)
With p explanatory variables, there are 2p possible models exhaustive search infeasible even with moderate p
Computationally feasible alternative: greedy or stepwise search,either using a likelihood ratio test or a model selection criterion suchas AIC.
General idea of greedy search: instead of exhaustively searching the fullmodel space, traverse it in small steps, adding or removing one explanatoryvariable at a time. Two approaches:
Backward selectionI start with full modelI sequentially drop variable that maximally reduces AICI stop when AIC cannot be minimized further
Forward selectionI start with empty modelI sequentially add variable that maximally reduces AICI stop when AIC cannot be minimized further
Do these methods necessarily find the model with the lowest AIC?
Recall model in Weibull regression: in logarithmic time scale
Y = logT = β0 + β1X1 + . . .+ βpXp + σZ ,
where Z has standard extreme value distribution.Standard representation valid for all location-scale families: if Y hasdensity f (y ;µ, σ) from a location-scale family, Y = µ+ σZ with Zhaving “standard” distribution f (z ; 0, 1)
Hence we use the same ansatz of regression models for log-normal andlog-logistic distributions:
Y = logT = β0 + β1X1 + . . .+ βpXp + σZ ,
where Z has standard normal (log-normal case) or standard logistic(log-logistic case) distribution
Model validation is analog to the Weibull case; it makes sense to make theQ-Q plot on the logarithmic time scale. R code for Q-Q plot (rest as forWeibull regression):qqPlot(log.resid, dist = "norm",
Fitting a log-logistic model is completely analog to fitting a Weibull model:addicts.llogis.full <- survreg(Surv(survt, status) ~ clinic + prison + dose,
data = addicts, dist = "loglogistic")summary(addicts.llogis.full)
Model validation is analog to the Weibull case; it makes sense to make theQ-Q plot on the logarithmic time scale. R code for Q-Q plot (rest as forWeibull regression):qqPlot(log.resid, dist = "norm",
Write the general form of a Cox proportional hazards modelExplain the difference to parametric regression modelsFit a Cox PH model in RInterpret the R output of a Cox PH fit, expecially the hazard ratiosPerform model validation with graphical methods and tests
Considered so far: parametric regression models for survival dataIn the absence of explanatory variables, we have seen the powerful,non-parametric Kaplan-Meier estimatorThe Cox proportional hazards (PH) model combines both the flexibilityof non-parametric models and the interpretability of regression modelsThe Cox PH model is one of the most popular models in survivalanalysis, especially in the analysis of medical data
The baseline hazard is not specified more precisely, i.e. has noparametric form Cox PH model is called a semiparametric modelThe baseline hazard is the same for all subjectsThe explanatory variables are assumed to be time-independentThe hazard function must, by definition, always be positive; theexponential function in the Cox PH model assures this (comparableapproach to logistic regression)
Why don’t we need a parameter β0, contrary to regression models?
Hazard of Cox PH model is not fully parametric no MLE possibleTherefore, consider partial likelihood Lc instead of likelihood:
Lc(β) =∏
i :δi=1
Li (β) ,
Li (β) = probability that an individual with covariates of subject i hasfailure at ti given that there is one failure in the risk set of tiIn “normal” likelihood, there is no conditioning involved.
Usual approach of estimating model (Cox, 1972), implemented in Rfunction coxph: maximize partial likelihoodThis approach makes baseline hazard a nuisance parameterHence, we only get an estimate of the coefficients β in a first stepThis is normally sufficient since we are usually only interested in thecoefficients: they define hazard ratios (see later).Consequence: if we want to plot the fitted survivor function, we mustuse a Kaplan-Meier estimator in addition to coxph
Consider two subjects i and j with explanatory variables xi1, . . . , xipand xj1, . . . , xjp, resp.
Their hazard ratio is HR =h(t; xi1, . . . , xip)
h(t; xj1, . . . , xjp)
The HR says how much more likely it is that subject i has an event inthe next time unit than subject jHeroin example: the HR between patients i and j says how muchlikely it is that patient i is released the next day than patient jIf the event refers to death, the HR expresses how much higher theinstantaneous death probability is for subject i than for subject j
Why is the hazard ratio interesting?Heroin example: suppose subjects i and j have both no prison record(xi2 = xj2 = 0) and got the same methadone dose (xi3 = xj3), butsubject i was treated in clinic 2 (xi1 = 1) and subject j in clinic 1(xj1 = 0).Then the hazard ratio of subjects i and j is
The estimated HR of the variable clinic is HR1 = 0.3643.What does this mean?Suppose you have to decide whether clinic 1 or clinic 2 does a betterjob (i.e. is releasing patients earlier) as a politician.Do you take the HR from a Cox PH model, or the log-rank test frombefore? Why?
The output of coxph cannot be plotted directly:plot(addicts.cox)
gives an error! (Why?)Example: estimate the survivor function for two patients with noprison record and a mean methadone dose, one in clinic 1 and one inclinic 2:sample.data <- data.frame(
Consequence: plots of log h(t; x1, . . . , xp) for different groups(assuming discrete, or discretized explanatory variables) should showparallel linesIn practice, we can use kernel estimates (see Part II) of the hazardfunctions
A more common approach for graphical model validation only involvesthe Kaplan-Meier estimator, and no estimate of the hazard functionsThe cumulative hazard function H(t; x1, . . . , xp) =∫ t0 h(u; x1, . . . , xp) du can be decomposed as
Graphical test for PH assumptionIf the proportional hazards assumption holds, a plot of log(− log(S)) vs. tfor different groups of subjects shows parallel lines.
The PH assumption can be tested variable by variable.
Simpler way: using function survplot from the rms package.As survplot is not compatible with survfit output, we have to fit the KMestimator using npsurv (which does exactly the same thing as survfit . . . ):library(rms)survplot(npsurv(Surv(survt, status) ~ clinic, data = addicts),
Goodness of fit testing approach is an alternative to graphical modelvalidationPro: provides a single p-value ( more objective)Contra: tests for violations of the assumption test must be donethe “wrong way”, with an unknown type II error rateRough idea of goodness of fit test:
I calculate residuals for each of the explanatory variablesI check whether the residuals are not correlated to survival time
Consider the heroin data set againSuppose subject i has as event at time t(j). Then his or herSchoenfeld residual for the variable dose is the difference betweenhis or her methadone dose and a weighted mean methadone dose ofall individuals still at risk at time t(j)
The mean is weighted by the hazard of the patients in the risk setThe goodness of fit test for the variable dose now tests the nullhypothesis that the correlation coefficient between the Schoenfeldresiduals for dose and the survival time is 0.For categorical explanatory variables, the calculation of the Schoenfeldresidual is a bit different, but similar.
Data set from the Insel hospital: overall survival (variable os) wasmeasured from 67 patients with non-small cell lung cancer (NSCLC)Additional variables:
Question: which genes or clinical variables are good predictors for theoverall survival of patients?We clearly have too many explanatory variables to fit a good model variable selection
variable selection for Cox models works exactly as for parametric regressionmodels.
Manual approach: iteratively remove least significant variable,compare larger and smaller model with likelihood-ratio testAutomatic approach: use forward or backward selection based on AIC
We demonstrate the first step of manual variable selection for the lungcancer data set:nsclc.cox.full <- coxph(Surv(os, status) ~ ., data = nsclc)summary(nsclc.cox.full)
The long output is omitted here; gldc is the least significant variable,hence we remove it:nsclc.cox.red <- update(nsclc.cox.full, . ~ . - gldc)# library(lmtest)anova(nsclc.cox.red, nsclc.cox.full, test = "Chisp")
## Analysis of Deviance Table## Cox model: response is Surv(os, status)## Model 1: ~ grade + stage + age + preop + other + rt + drug + cda + rrm2 + tk1 + tyms## Model 2: ~ grade + stage + age + preop + other + rt + drug + cda + gldc + rrm2 + tk1 + tyms## loglik Chisq Df P(>|Chi|)## 1 -37.681## 2 -37.677 0.0078 1 0.9295
Lung cancer data set: automatic variable selection
The well-known function stepAIC is a more convenient way of doingvariable selection, e.g. with backward selection:library(MASS)nsclc.cox.red <- stepAIC(nsclc.cox.full, direction = "backward", trace = 0)summary(nsclc.cox.red)