Top Banner
BIOSTATS 640 – Spring 2018 6. Survival Analysis R Illustration ….R\00. R Handouts 2017-18\R for Survival Analysis.docx Page 1 of 16 6. Introduction to Survival Analysis Illustration – R Users April 2018 1. Illustration: DPCA Study of Primary Biliary Cirrhosis ……………… 2. Prepare Data for Survival Analysis …………..………………………. 3. Model Free Approaches ……………………………..………………… a. Descriptives ……………………………………………………….. b. Kaplan-Meier Curve Estimation ………………..……………….. c. Kaplan-Meier Curve Plot ……………………….………………. d. Log Rank Test for Equality of Survival Distributions ….….…….. 4. Cox PH Model Regression …………………………………….………. a. Fit Cox PH Model …………………………………………………. b. Multivariable Model Development ………………………………... c. Side-by-side Comparison of Models ………………………………. 5. Regression Diagnostics for Cox PH Model ……………………...……. a. Test of Proportional Hazards …………………………….…..…….. b. Graphical Assessment of Proportional Hazards ……………………. 2 3 4 4 6 7 8 9 9 11 14 15 15 15
16

R for Survival Analysis - UMass Amherstcourses.umass.edu/biep640w/pdf/R for Survival Analysis.pdfR Handouts 2017-18\R for Survival Analysis.docx ... b. Kaplan-Meier Curve Estimation

May 16, 2018

Download

Documents

builien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: R for Survival Analysis - UMass Amherstcourses.umass.edu/biep640w/pdf/R for Survival Analysis.pdfR Handouts 2017-18\R for Survival Analysis.docx ... b. Kaplan-Meier Curve Estimation

BIOSTATS 640 – Spring 2018 6. Survival Analysis R Illustration

….R\00. R Handouts 2017-18\R for Survival Analysis.docx Page 1 of 16

6. Introduction to Survival Analysis Illustration – R Users

April 2018

1. Illustration: DPCA Study of Primary Biliary Cirrhosis ……………… 2. Prepare Data for Survival Analysis …………..………………………. 3. Model Free Approaches ……………………………..………………… a. Descriptives ……………………………………………………….. b. Kaplan-Meier Curve Estimation ………………..……………….. c. Kaplan-Meier Curve Plot ……………………….………………. d. Log Rank Test for Equality of Survival Distributions ….….…….. 4. Cox PH Model Regression …………………………………….………. a. Fit Cox PH Model …………………………………………………. b. Multivariable Model Development ………………………………... c. Side-by-side Comparison of Models ………………………………. 5. Regression Diagnostics for Cox PH Model ……………………...……. a. Test of Proportional Hazards …………………………….…..…….. b. Graphical Assessment of Proportional Hazards …………………….

2

3

4 4 6 7 8

9 9

11 14

15 15 15

Page 2: R for Survival Analysis - UMass Amherstcourses.umass.edu/biep640w/pdf/R for Survival Analysis.pdfR Handouts 2017-18\R for Survival Analysis.docx ... b. Kaplan-Meier Curve Estimation

BIOSTATS 640 – Spring 2018 6. Survival Analysis R Illustration

….R\00. R Handouts 2017-18\R for Survival Analysis.docx Page 2 of 16

1. Illustration DPCA Study of Primary Biliary Cirrhosis

Preliminary – Download the R data set pbc.Rdata

Download from course website to desk top The dataset pbc.dta can be found in two places: http://people.umass.edu/biep640w/webpages/demonstrations.html

http://people.umass.edu/biep640w/webpages/survival.htm DPCA Study of Primary Biliary Cirrhosis source: Dickson ER, Grambsch PM and Fleming TR (1989) Prognosis in primary biliary-cirrhosis - model for decision making. Hepatology, 10, 1-7. Introduction. Bile is a fluid produced in your liver which functions in the digestion of food and, in aids in ridding your body of worn-out red blood cells, cholesterol and toxins. The disease primary biliary cirrhosis is an autoimmune disease in which the body turns against its own cells, in this case bile ducts. As the bile ducts are increasingly damaged, harmful substances can accumulate. This can lead to irreversible scarring of liver tissue (this is cirrhosis). Among other things, the sufferer can experience abdominal pain, internal bleeding and, ultimately, liver failure. Primary biliary cirrhosis is also a risk factor for liver cancer. This illustration utilizes data from a randomized controlled trial of D-penicillamine (DPCA) for the treatment of primary biliary cirrhosis. A total of n=312 consenting subjects were enrolled and randomized to either active treatment or placebo-control (presumably this group received standard care). Time zero is date of diagnosis and initiation of treatment. Study participants were followed to event of end-stage liver disease or censoring. Thus, these are an example of “right” censored data. Over the approximate 10 years of follow-up, 125 events of death (40%) were observed. The goal of these analyses was to assess the benefit of randomization to DPCA on survival, overall and after adjustment for selected, important, covariates. Data Dictionary/Coding Manual. This illustration utilizes the following variables in pbc.dta.

Variable Codings Label years Continuous (range: 0.11 – 12.47) Time to death (in years) status 1 = dead 0 = censored Event/censoring indicator rx 1 = DPCA 0 = Control Treatment/randomization histol 1=lowest, 2, 3, 4=highest Severity of liver damage at dx bilirubin Continuous, mg/dl Serum bilirubin

Page 3: R for Survival Analysis - UMass Amherstcourses.umass.edu/biep640w/pdf/R for Survival Analysis.pdfR Handouts 2017-18\R for Survival Analysis.docx ... b. Kaplan-Meier Curve Estimation

BIOSTATS 640 – Spring 2018 6. Survival Analysis R Illustration

….R\00. R Handouts 2017-18\R for Survival Analysis.docx Page 3 of 16

2. Prepare Data for Survival Analysis

Attachlibraries(Thisassumesthatyouhaveinstalledthesepackagesusingthecommandinstall.packages(“NAMEOFPACKAGE”) NOTE: An alternative method for installing packages is to do the following in your R session: 1) At right, click on the tab PACKAGES 2) Next, click on the INSTALL icon A few have told me that this method is less error prone .. that would be nice!

library(psych) library(survival) library(gmodels) library(stargazer)

LoadRdata.Createa“survival”object.Inamedminepbc.survival

setwd("/Users/cbigelow/Desktop") load(file="pbc.Rdata”) # survivalobjectname <- with(dataframename, Surv(timetoeventvariable,censoringvariable)) pbc.survival <- with(pbc, Surv(years,status))

Page 4: R for Survival Analysis - UMass Amherstcourses.umass.edu/biep640w/pdf/R for Survival Analysis.pdfR Handouts 2017-18\R for Survival Analysis.docx ... b. Kaplan-Meier Curve Estimation

BIOSTATS 640 – Spring 2018 6. Survival Analysis R Illustration

….R\00. R Handouts 2017-18\R for Survival Analysis.docx Page 4 of 16

3. Model Free Approaches

a. Descriptives

SingleVariableDescriptives

# Descriptives - One way # Continuous variables. Command is describe(df$variable). Package is psych describe(pbc$years) describe(pbc$bilirubin)

# Discrete variables. Brute force # rx freq <- table(pbc$rx) n <- length(pbc$rx) relfreq <- round(freq/n, digits=4) cumfreq <- cumsum(freq) cumrelfreq <- round(cumsum(relfreq), digits=4) tablerx <- cbind(freq,relfreq, cumfreq, cumrelfreq) colnames(tablerx) <- c("Freq", "%", "Cum. Freq", "Cum %") paste("rx = Treatment/Randomization") tablerx

# histol freq <- table(pbc$histol) n <- length(pbc$histol) relfreq <- round(freq/n, digits=4) cumfreq <- cumsum(freq) cumrelfreq <- round(cumsum(relfreq), digits=4) tablehistol <- cbind(freq,relfreq, cumfreq, cumrelfreq) colnames(tablehistol) <- c("Freq", "%", "Cum. Freq", "Cum %") paste("histol = Severity of Liver Damage at Diagnosis") tablehistol

describe(pbc$years)

vars n mean sd median trimmed mad min max range skew kurtosis se X1 1 312 5.49 3.08 5.04 5.35 3.08 0.11 12.47 12.36 0.37 -0.62 0.17

describe(pbc$bilirubin)

vars n mean sd median trimmed mad min max range skew kurtosis se X1 1 312 3.26 4.53 1.35 2.17 1.11 0.3 28 27.7 2.82 8.65 0.26

[1] "rx = Treatment/Randomization" tablerx

Freq % Cum. Freq Cum % 0 158 0.5064 158 0.5064 1 154 0.4936 312 1.0000

Page 5: R for Survival Analysis - UMass Amherstcourses.umass.edu/biep640w/pdf/R for Survival Analysis.pdfR Handouts 2017-18\R for Survival Analysis.docx ... b. Kaplan-Meier Curve Estimation

BIOSTATS 640 – Spring 2018 6. Survival Analysis R Illustration

….R\00. R Handouts 2017-18\R for Survival Analysis.docx Page 5 of 16

[1] "histol = Severity of Liver Damage at Diagnosis" tablehistol Freq % Cum. Freq Cum % 1 16 0.0513 16 0.0513 2 67 0.2147 83 0.2660 3 120 0.3846 203 0.6506 4 109 0.3494 312 1.0000

CrosstabofTreatmentxEvent

# 2 way crosstab. Command is CrossTable(df$row, df$column). Package is gmodels CrossTable(pbc$rx,pbc$status, digits=2, prop.c=FALSE, prop.t=FALSE, prop.chisq=FALSE, expected=FALSE,dnn=c("Randomization/Treatment", "Event"))

## ## ## Cell Contents ## |-------------------------| ## | N | ## | N / Row Total | ## |-------------------------| ## ## ## Total Observations in Table: 312 ## ## ## | Event ## Randomization/Treatment | 0 | 1 | Row Total | ## ------------------------|-----------|-----------|-----------| ## 0 | 93 | 65 | 158 | ## | 0.59 | 0.41 | 0.51 | ## ------------------------|-----------|-----------|-----------| ## 1 | 94 | 60 | 154 | ## | 0.61 | 0.39 | 0.49 | ## ------------------------|-----------|-----------|-----------| ## Column Total | 187 | 125 | 312 | ## ------------------------|-----------|-----------|-----------| ## ##

Interpretation: Among n=158 randomized to CONTROL, there were 65 deaths (41%) Among n=154 randomized to active treatment DPCA, there were 60 deaths (40%)

Page 6: R for Survival Analysis - UMass Amherstcourses.umass.edu/biep640w/pdf/R for Survival Analysis.pdfR Handouts 2017-18\R for Survival Analysis.docx ... b. Kaplan-Meier Curve Estimation

BIOSTATS 640 – Spring 2018 6. Survival Analysis R Illustration

….R\00. R Handouts 2017-18\R for Survival Analysis.docx Page 6 of 16

b. Kaplan-Meier Curve Estimation

Kaplan-MeierEstimatesofSurvival-Overall

# Kaplan-Meier Curve Estimation - Overall # Command is survfit(survivalobject~1, data-DATAFRAME). Package is survival kmall <- survfit(pbc.survival~1,data=pbc) summary(kmall)

## Call: survfit(formula = pbc.survival ~ 1, data = pbc) ## ## time n.risk n.event survival std.err lower 95% CI upper 95% CI ## 0.112 312 1 0.997 0.00320 0.991 1.000 ## 0.140 311 1 0.994 0.00452 0.985 1.000

--- output omitted ---

## 11.168 17 1 0.369 0.04895 0.285 0.479 ## 11.474 13 1 0.341 0.05278 0.251 0.461

Kaplan-MeierEstimatesofSurvival-byRandomization/Treatment

# Kaplan-Meier Curve Estimation – By group (in this case rx) # Command is survfit(survivalobject~GROUPVARIABLE, data-DATAFRAME). Package is survival kmrx <- survfit(pbc.survival~rx,data=pbc) summary(kmrx)

## Call: survfit(formula = pbc.survival ~ rx, data = pbc) ## ## rx=0 ## time n.risk n.event survival std.err lower 95% CI upper 95% CI ## 0.112 158 1 0.994 0.00631 0.981 1.000 ## 0.194 157 1 0.987 0.00889 0.970 1.000 --- output omitted

## 11.474 7 1 0.319 0.07922 0.196 0.519 ## ## rx=1 ## time n.risk n.event survival std.err lower 95% CI upper 95% CI ## 0.140 154 1 0.994 0.00647 0.981 1.000 ## 0.211 153 1 0.987 0.00912 0.969 1.000 ## 0.301 152 1 0.981 0.01114 0.959 1.000 --- output omitted

## 10.511 13 1 0.394 0.06719 0.282 0.551 ## 10.549 12 1 0.361 0.06916 0.248 0.526

Page 7: R for Survival Analysis - UMass Amherstcourses.umass.edu/biep640w/pdf/R for Survival Analysis.pdfR Handouts 2017-18\R for Survival Analysis.docx ... b. Kaplan-Meier Curve Estimation

BIOSTATS 640 – Spring 2018 6. Survival Analysis R Illustration

….R\00. R Handouts 2017-18\R for Survival Analysis.docx Page 7 of 16

c. Kaplan-Meier Curve Plot

Kaplan-MeierPlot-Overall

plot(kmall, xlab="Years", ylab="% Surviving", yscale=100, main="Survival Distribution (Overall)")

Kaplan-Meier Plot - by rx plot(kmrx, xlab="Years", ylab="% Surviving", yscale=100, col=c("red","blue"), main="Survival Distributions by RX") legend("topright", title="RX", c("Control", "DPCA"), fill=c("red", "blue"))

Page 8: R for Survival Analysis - UMass Amherstcourses.umass.edu/biep640w/pdf/R for Survival Analysis.pdfR Handouts 2017-18\R for Survival Analysis.docx ... b. Kaplan-Meier Curve Estimation

BIOSTATS 640 – Spring 2018 6. Survival Analysis R Illustration

….R\00. R Handouts 2017-18\R for Survival Analysis.docx Page 8 of 16

d. Log Rank Test of Equality of Survival Distributions Log Rank Test # Log Rank Test of Equality of Survival Distributions over groups # Command is survdiff(survivalobject~GROUPVARIABLE, data-DATAFRAME). Package is survival survdiff(pbc.survival~rx, data=pbc)

## Call: ## survdiff(formula = pbc.survival ~ rx, data = pbc) ## ## N Observed Expected (O-E)^2/E (O-E)^2/V ## rx=0 158 65 63.2 0.0502 0.102 ## rx=1 154 60 61.8 0.0513 0.102 ## ## Chisq= 0.1 on 1 degrees of freedom, p= 0.75

Interpretation: Do NOT reject. Assumption of the null hypothesis has NOT led to an unlikely result (p-value = .75). We have no statistically significant evidence that the survival distributions are not the same.

Page 9: R for Survival Analysis - UMass Amherstcourses.umass.edu/biep640w/pdf/R for Survival Analysis.pdfR Handouts 2017-18\R for Survival Analysis.docx ... b. Kaplan-Meier Curve Estimation

BIOSTATS 640 – Spring 2018 6. Survival Analysis R Illustration

….R\00. R Handouts 2017-18\R for Survival Analysis.docx Page 9 of 16

4. Cox PH Model Regression Recall. The Cox PH model models the hazard of event (in this case death) at time “t” as the product of a baseline hazard times exp(linear model in the predictors X1, X2, …. Xp). Here, p=3 because we have 3 predictors of interest:

X1 = rx , 0/1 indicator of randomization X2 = histol, ordinal measure of degree of tissue damage at diagnosis X3 = bilirubin, continuous (mg/dl) Note. The predictor histol is an ordinal predictor. So, in R, we will need to declare it as a factor variable prior to modeling. a. Fit Cox PH Model

FitONEpredictorCOXPHModels

# Fit Cox PH Model to a 0/1 predictor # Command is coxph(survivalobject~01PREDICTTOR, data-DATAFRAME). Package is survival modelrx <- coxph(pbc.survival~rx, data=pbc) modelrx

## Call: ## coxph(formula = pbc.survival ~ rx, data = pbc) ## ## coef exp(coef) se(coef) z p ## rx -0.0572 0.9444 0.1792 -0.32 0.75 ## ## Likelihood ratio test=0.1 on 1 df, p=0.749 ## n= 312, number of events= 125

Interpretation: Relative to control patients, patients treated with DPCA have lower hazard of death (HR = exp[coef] = .94) at all times of follow-up. This very small benefit is not statistically significant (p-value = .75). # Fit Cox PH Model to a nominal predictor. It must be declared as a factor # Command is coxph(survivalobject~factor(PREDICTTOR), data-DATAFRAME). Package is survival modelhistol <- coxph(pbc.survival~factor(histol), data=pbc) modelhistol

## Call: ## coxph(formula = pbc.survival ~ factor(histol), data = pbc) ## ## coef exp(coef) se(coef) z p ## factor(histol)2 1.61 4.99 1.03 1.56 0.1191 ## factor(histol)3 2.15 8.58 1.01 2.12 0.0337 ## factor(histol)4 3.06 21.39 1.01 3.04 0.0024 ## ## Likelihood ratio test=52.7 on 3 df, p=2.08e-11 ## n= 312, number of events= 125

Interpretation: Recall. Higher score on histol (valid scores = 1, 2, 3, 4) represent greater level of liver tissue damage present at diagnosis. This model shows that higher (“worse”) values of histol at diagnosis are associated with poorer prognosis (Hazard ratio estimates increase from 1 to 4.98 to 8.58 to 21.4, relative to the referent group histol=1). This is highly statistically significant (p-value < < .0001).

h(t; X1,...Xp )=h0(t) exp[ β1X1+...+βpXp ]

Page 10: R for Survival Analysis - UMass Amherstcourses.umass.edu/biep640w/pdf/R for Survival Analysis.pdfR Handouts 2017-18\R for Survival Analysis.docx ... b. Kaplan-Meier Curve Estimation

BIOSTATS 640 – Spring 2018 6. Survival Analysis R Illustration

….R\00. R Handouts 2017-18\R for Survival Analysis.docx Page 10 of 16

# Fit Cox PH Model to a continuous predictor # Command is coxph(survivalobject~PREDICTTOR, data-DATAFRAME). Package is survival modelbili <- coxph(pbc.survival~bilirubin, data=pbc) modelbili

## Call: ## coxph(formula = pbc.survival ~ bilirubin, data = pbc) ## ## coef exp(coef) se(coef) z p ## bilirubin 0.149 1.161 0.013 11.4 <2e-16 ## ## Likelihood ratio test=84.7 on 1 df, p=0 ## n= 312, number of events= 125

Interpretation: Associated with each 1 unit (1 mg/dl) increase in bilirubin is an increased risk of death at all times of follow-up (HR = 1.16). This is highly statistically significant (p-value < < .0001).

# Display models side by side # Command is stargazer(MODEL, MODEL, MODEL, type=”text”). Package is stargazer

stargazer(modelrx, modelhistol, modelbili, type="text")

## ## ========================================================================== ## Dependent variable: ## ----------------------------------------------------- ## pbc.survival ## (1) (2) (3) ## -------------------------------------------------------------------------- ## rx -0.057 ## (0.179) ## ## factor(histol)2 1.607 ## (1.031) ## ## factor(histol)3 2.150** ## (1.012) ## ## factor(histol)4 3.063*** ## (1.009) ## ## bilirubin 0.149*** ## (0.013) ## ## -------------------------------------------------------------------------- ## Observations 312 312 312 ## R2 0.0003 0.156 0.238 ## Max. Possible R2 0.983 0.983 0.983 ## Log Likelihood -639.915 -613.597 -597.641 ## Wald Test 0.100 (df = 1) 43.920*** (df = 3) 130.920*** (df = 1) ## LR Test 0.102 (df = 1) 52.738*** (df = 3) 84.651*** (df = 1) ## Score (Logrank) Test 0.102 (df = 1) 53.853*** (df = 3) 190.869*** (df = 1) ## ========================================================================== ## Note: *p<0.1; **p<0.05; ***p<0.01

Page 11: R for Survival Analysis - UMass Amherstcourses.umass.edu/biep640w/pdf/R for Survival Analysis.pdfR Handouts 2017-18\R for Survival Analysis.docx ... b. Kaplan-Meier Curve Estimation

BIOSTATS 640 – Spring 2018 6. Survival Analysis R Illustration

….R\00. R Handouts 2017-18\R for Survival Analysis.docx Page 11 of 16

b. Multivariable Model Development

FitSeriesofModels–Everymodelwillincluderxasthefocusoftheanalysisisonrandomization/treatment

# Predictor = rx modelrx <- coxph(pbc.survival~rx, data=pbc) summary(modelrx)

## Call: ## coxph(formula = pbc.survival ~ rx, data = pbc) ## ## n= 312, number of events= 125 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## rx -0.05722 0.94438 0.17916 -0.319 0.749 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## rx 0.9444 1.059 0.6647 1.342 ## ## Concordance= 0.499 (se = 0.025 ) ## Rsquare= 0 (max possible= 0.983 ) ## Likelihood ratio test= 0.1 on 1 df, p=0.7494 ## Wald test = 0.1 on 1 df, p=0.7494 ## Score (logrank) test = 0.1 on 1 df, p=0.7494

# Predictor = rx + histol modelhistolrx <- coxph(pbc.survival ~factor(histol)+rx, data=pbc) summary(modelhistolrx)

## Call: ## coxph(formula = pbc.survival ~ factor(histol) + rx, data = pbc) ## ## n= 312, number of events= 125 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## factor(histol)2 1.6285 5.0964 1.0314 1.579 0.11435 ## factor(histol)3 2.1768 8.8181 1.0127 2.149 0.03160 * ## factor(histol)4 3.0929 22.0417 1.0096 3.064 0.00219 ** ## rx -0.1471 0.8632 0.1799 -0.818 0.41342 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## factor(histol)2 5.0964 0.19622 0.6750 38.477 ## factor(histol)3 8.8181 0.11340 1.2116 64.179 ## factor(histol)4 22.0417 0.04537 3.0473 159.433 ## rx 0.8632 1.15850 0.6067 1.228 ## ## Concordance= 0.701 (se = 0.028 ) ## Rsquare= 0.157 (max possible= 0.983 ) ## Likelihood ratio test= 53.41 on 4 df, p=7e-11 ## Wald test = 44.53 on 4 df, p=4.988e-09 ## Score (logrank) test = 54.48 on 4 df, p=4.179e-11

Page 12: R for Survival Analysis - UMass Amherstcourses.umass.edu/biep640w/pdf/R for Survival Analysis.pdfR Handouts 2017-18\R for Survival Analysis.docx ... b. Kaplan-Meier Curve Estimation

BIOSTATS 640 – Spring 2018 6. Survival Analysis R Illustration

….R\00. R Handouts 2017-18\R for Survival Analysis.docx Page 12 of 16

# Predictor = rx + bilirubin modelbilirx <- coxph(pbc.survival ~ bilirubin+rx, data=pbc) summary(modelbilirx)

## Call: ## coxph(formula = pbc.survival ~ bilirubin + rx, data = pbc) ## ## n= 312, number of events= 125 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## bilirubin 0.15147 1.16354 0.01329 11.400 <2e-16 *** ## rx -0.20118 0.81776 0.18342 -1.097 0.273 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## bilirubin 1.1635 0.8594 1.1336 1.194 ## rx 0.8178 1.2228 0.5708 1.172 ## ## Concordance= 0.78 (se = 0.029 ) ## Rsquare= 0.241 (max possible= 0.983 ) ## Likelihood ratio test= 85.86 on 2 df, p=0 ## Wald test = 130.9 on 2 df, p=0 ## Score (logrank) test = 191.2 on 2 df, p=0

# Predictor = rx + histol + bilirubin modelall <- coxph(pbc.survival~factor(histol)+bilirubin+rx, data=pbc) summary(modelall)

## Call: ## coxph(formula = pbc.survival ~ factor(histol) + bilirubin + rx, ## data = pbc) ## ## n= 312, number of events= 125 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## factor(histol)2 1.52551 4.59750 1.03213 1.478 0.13940 ## factor(histol)3 1.92272 6.83953 1.01495 1.894 0.05817 . ## factor(histol)4 2.79685 16.39296 1.01070 2.767 0.00565 ** ## bilirubin 0.14764 1.15910 0.01407 10.495 < 2e-16 *** ## rx -0.15849 0.85343 0.18152 -0.873 0.38258 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## factor(histol)2 4.5975 0.2175 0.6081 34.759 ## factor(histol)3 6.8395 0.1462 0.9356 49.998 ## factor(histol)4 16.3930 0.0610 2.2613 118.840 ## bilirubin 1.1591 0.8627 1.1276 1.192 ## rx 0.8534 1.1717 0.5979 1.218 ## ## Concordance= 0.807 (se = 0.029 ) ## Rsquare= 0.336 (max possible= 0.983 ) ## Likelihood ratio test= 127.6 on 5 df, p=0 ## Wald test = 148.4 on 5 df, p=0 ## Score (logrank) test = 217.1 on 5 df, p=0

Page 13: R for Survival Analysis - UMass Amherstcourses.umass.edu/biep640w/pdf/R for Survival Analysis.pdfR Handouts 2017-18\R for Survival Analysis.docx ... b. Kaplan-Meier Curve Estimation

BIOSTATS 640 – Spring 2018 6. Survival Analysis R Illustration

….R\00. R Handouts 2017-18\R for Survival Analysis.docx Page 13 of 16

stargazer(modelrx, modelhistolrx, modelbilirx, modelall, type="text") ## ## ============================================================================================== ## Dependent variable: ## ------------------------------------------------------------------------- ## pbc.survival ## (1) (2) (3) (4) ## ---------------------------------------------------------------------------------------------- ## factor(histol)2 1.629 1.526 ## (1.031) (1.032) ## ## factor(histol)3 2.177** 1.923* ## (1.013) (1.015) ## ## factor(histol)4 3.093*** 2.797*** ## (1.010) (1.011) ## ## bilirubin 0.151*** 0.148*** ## (0.013) (0.014) ## ## rx -0.057 -0.147 -0.201 -0.158 ## (0.179) (0.180) (0.183) (0.182) ## ## ---------------------------------------------------------------------------------------------- ## Observations 312 312 312 312 ## R2 0.0003 0.157 0.241 0.336 ## Max. Possible R2 0.983 0.983 0.983 0.983 ## Log Likelihood -639.915 -613.262 -597.038 -576.150 ## Wald Test 0.100 (df = 1) 44.530*** (df = 4) 130.940*** (df = 2) 148.440*** (df = 5) ## LR Test 0.102 (df = 1) 53.408*** (df = 4) 85.858*** (df = 2) 127.632*** (df = 5) ## Score (Logrank) Test 0.102 (df = 1) 54.478*** (df = 4) 191.230*** (df = 2) 217.136*** (df = 5) ## ============================================================================================== ## Note:

LRTests# Likelihood Ratio Test Comparison of “Reduced” v “Full” Models # Requires fit of both models first # Command is anova(REDUCED, FULL). Package is survival

# Reduced = histol # Full = histol + rx anova(modelhistol,modelhistolrx)

## Analysis of Deviance Table ## Cox model: response is pbc.survival ## Model 1: ~ factor(histol) ## Model 2: ~ factor(histol) + rx ## loglik Chisq Df P(>|Chi|) ## 1 -613.60 ## 2 -613.26 0.6697 1 0.4132 Interpretation: Do not reject. After adjustment for histol, randomization to DPCA is NOT associated with survival (LR Test p-value = .41)

Page 14: R for Survival Analysis - UMass Amherstcourses.umass.edu/biep640w/pdf/R for Survival Analysis.pdfR Handouts 2017-18\R for Survival Analysis.docx ... b. Kaplan-Meier Curve Estimation

BIOSTATS 640 – Spring 2018 6. Survival Analysis R Illustration

….R\00. R Handouts 2017-18\R for Survival Analysis.docx Page 14 of 16

# Reduced = bilirubin # Full = bilirubin + rx anova(modelbili,modelbilirx) ## Analysis of Deviance Table ## Cox model: response is pbc.survival ## Model 1: ~ bilirubin ## Model 2: ~ bilirubin + rx ## loglik Chisq Df P(>|Chi|) ## 1 -597.64 ## 2 -597.04 1.2065 1 0.272

Interpretation: Do not reject. After adjustment for bilirubin, randomization to DPCA is NOT associated with survival (LR Test p-value = .27) # Reduced = histol + bilirubin # Full = histol + bilirubin + rx # oops – need to fit this particular reduced model first modelboth <- coxph(pbc.survival ~ factor(histol)+bilirubin, data=pbc) anova(modelboth,modelall)

## Analysis of Deviance Table ## Cox model: response is pbc.survival ## Model 1: ~ factor(histol) + bilirubin ## Model 2: ~ factor(histol) + bilirubin + rx ## loglik Chisq Df P(>|Chi|) ## 1 -576.53 ## 2 -576.15 0.7634 1 0.3823

Interpretation: Do not reject. After adjustment for both histol and bilirubin, randomization to DPCA is NOT associated with survival (LR Test p-value = .38) c. Side-by-side Comparison of Models stargazer(modelrx, modelhistolrx, modelbilirx, modelall, type="text")

## ## ============================================================================================== ## Dependent variable: ## ------------------------------------------------------------------------- ## pbc.survival ## (1) (2) (3) (4) ## ---------------------------------------------------------------------------------------------- ## factor(histol)2 1.629 1.526 ## (1.031) (1.032) ## ## factor(histol)3 2.177** 1.923* ## (1.013) (1.015) ## ## factor(histol)4 3.093*** 2.797*** ## (1.010) (1.011) ## ## bilirubin 0.151*** 0.148*** ## (0.013) (0.014) ## ## rx -0.057 -0.147 -0.201 -0.158 ## (0.179) (0.180) (0.183) (0.182) ## ## ---------------------------------------------------------------------------------------------- ## Observations 312 312 312 312 ## R2 0.0003 0.157 0.241 0.336 ## Max. Possible R2 0.983 0.983 0.983 0.983 ## Log Likelihood -639.915 -613.262 -597.038 -576.150 ## Wald Test 0.100 (df = 1) 44.530*** (df = 4) 130.940*** (df = 2) 148.440*** (df = 5) ## LR Test 0.102 (df = 1) 53.408*** (df = 4) 85.858*** (df = 2) 127.632*** (df = 5) ## Score (Logrank) Test 0.102 (df = 1) 54.478*** (df = 4) 191.230*** (df = 2) 217.136*** (df = 5) ## ============================================================================================== ## Note: *p<0.1; **p<0.05; ***p<0.01

Page 15: R for Survival Analysis - UMass Amherstcourses.umass.edu/biep640w/pdf/R for Survival Analysis.pdfR Handouts 2017-18\R for Survival Analysis.docx ... b. Kaplan-Meier Curve Estimation

BIOSTATS 640 – Spring 2018 6. Survival Analysis R Illustration

….R\00. R Handouts 2017-18\R for Survival Analysis.docx Page 15 of 16

5. Regression Diagnostics for Cox PH Model a. Test of Proportional Hazards

Goal. For rare events, a hazard ratio may be interpreted as a relative risk. The assumption of proportional hazards says that the hazard ratio, comparing “exposed” versus “not exposed” or a the effect of a unit increase in exposure, is constant over time. Test. The test of the assumption of proportional hazards has NULL: proportional hazards assumption is true. Thus, we hope to retain the null! Test of Proportional Hazards cox.zph(modelall, transform="km", global=TRUE)

## rho chisq p ## factor(histol)2 0.0255 0.0813 0.7755 ## factor(histol)3 0.0110 0.0150 0.9026 ## factor(histol)4 -0.0402 0.2010 0.6539 ## bilirubin 0.1091 1.1172 0.2905 ## rx -0.0932 1.0539 0.3046 ## GLOBAL NA 13.7628 0.0172

Interpretation: The global test is significant (p-value = .0172) … but …. For each predictor, do not reject the assumption of proportional hazards b. Graphical Assessment of Proportional Hazards Graphical Assessments. When the assumption of proportional hazards is true, you should seen the following in your graphical assessments

Plot of Y=survival function versus X=Time over Groups=Exposure (1=yes, 0=no) Look for: parallel curves Plot of Y=ln(-ln(survival function)) versus X=ln(Time) over Groups=Exposure (1=yes, 0=no) Look for: parallel lines Plot of Y=scaled Schoenfeld residuals associated with predictor versus X=Time For each predictor, an enhanced model that allows for the inclusion of time dependency is considered. The assumption of proportional hazards will be indicated by a NON-zero slope for the added time-dependent predictor. From this model, one obtains the Schoenfeld residuals which are then plotted. Look for: Even band of scaled residuals centered at 0 on the y-axis

# Preliminary - Test the proportional hazards assumption test.ph <- cox.zph(modelall, transform="km", global=TRUE)test.ph

# Graphical Assessment of proportional hazards assumption # Plot Scaled Schoenfeld residuals v time (Look for even band at 0) # Histology severity=2 plot(test.ph[1],main="Histology severity=2") abline(h=0)

Page 16: R for Survival Analysis - UMass Amherstcourses.umass.edu/biep640w/pdf/R for Survival Analysis.pdfR Handouts 2017-18\R for Survival Analysis.docx ... b. Kaplan-Meier Curve Estimation

BIOSTATS 640 – Spring 2018 6. Survival Analysis R Illustration

….R\00. R Handouts 2017-18\R for Survival Analysis.docx Page 16 of 16

#Histology severity=3 plot(test.ph[2], main="Histology severity=3") abline(h=0)

#Histology severity=4 plot(test.ph[3], main="Histology severity=4") abline(h=0)

# Bilirubin plot(test.ph[4], main="Bilirubin") abline(h=0)

# Randomization plot(test.ph[5], main="Randomization") abline(h=0)