Top Banner
BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 1 of 21 8. Introduction to Survival Analysis Illustration – R Users Spring 2020 1. Illustration: DPCA Study of Primary Biliary Cirrhosis ……………… 2. Prepare Data for Survival Analysis …………..………………………. 3. Model Free Approaches ……………………………..………………… a. Descriptives ……………………………………………………….. b. Kaplan-Meier Curve Estimation ………………..……………….. c. Kaplan-Meier Curve Plot ……………………….………………. d. Log Rank Test for Equality of Survival Distributions ….….…….. e. Assessment of PH Using -log-log Plot …………………………… 4. Cox PH Model Regression …………………………………….………. a. Fit Cox PH Model …………………………………………………. b. Multivariable Model Development ………………………………... 5. Regression Diagnostics for Cox PH Model ……………………...……. a. Assessment of Proportional Hazards ………………………...…….. b. Assessment of GOF: Kaplan-Meier v Cox PH ……………………. 2 3 4 4 5 6 10 10 11 11 14 17 17 20
21

R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

Jul 13, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 1 of 21

8. Introduction to Survival Analysis Illustration – R Users

Spring 2020

1. Illustration: DPCA Study of Primary Biliary Cirrhosis ……………… 2. Prepare Data for Survival Analysis …………..………………………. 3. Model Free Approaches ……………………………..………………… a. Descriptives ……………………………………………………….. b. Kaplan-Meier Curve Estimation ………………..……………….. c. Kaplan-Meier Curve Plot ……………………….………………. d. Log Rank Test for Equality of Survival Distributions ….….…….. e. Assessment of PH Using -log-log Plot …………………………… 4. Cox PH Model Regression …………………………………….………. a. Fit Cox PH Model …………………………………………………. b. Multivariable Model Development ………………………………... 5. Regression Diagnostics for Cox PH Model ……………………...……. a. Assessment of Proportional Hazards ………………………...…….. b. Assessment of GOF: Kaplan-Meier v Cox PH …………………….

2

3

4 4 5 6

10 10

11 11 14

17 17 20

Page 2: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 2 of 21

1. Illustration DPCA Study of Primary Biliary Cirrhosis

Dataset Used in this Illustration: pbc.Rdata DPCA Study of Primary Biliary Cirrhosis source: Dickson ER, Grambsch PM and Fleming TR (1989) Prognosis in primary biliary-cirrhosis - model for decision making. Hepatology, 10, 1-7. Introduction. Bile is a fluid produced in your liver which functions in the digestion of food and, in aids in ridding your body of worn-out red blood cells, cholesterol and toxins. The disease primary biliary cirrhosis is an autoimmune disease in which the body turns against its own cells, in this case bile ducts. As the bile ducts are increasingly damaged, harmful substances can accumulate. This can lead to irreversible scarring of liver tissue (this is cirrhosis). Among other things, the sufferer can experience abdominal pain, internal bleeding and, ultimately, liver failure. Primary biliary cirrhosis is also a risk factor for liver cancer. This illustration utilizes data from a randomized controlled trial of D-penicillamine (DPCA) for the treatment of primary biliary cirrhosis. A total of n=312 consenting subjects were enrolled and randomized to either active treatment or placebo-control (presumably this group received standard care). Time zero is date of diagnosis and initiation of treatment. Study participants were followed to event of end-stage liver disease or censoring. Thus, these are an example of “right” censored data. Over the approximate 10 years of follow-up, 125 events of death (40%) were observed. The goal of these analyses was to assess the benefit of randomization to DPCA on survival, overall and after adjustment for selected, important, covariates. Data Dictionary/Coding Manual. This illustration utilizes the following variables in pbc.dta.

Variable Codings Label years Continuous (range: 0.11 – 12.47) Time to death (in years) status 1 = dead 0 = censored Event/censoring indicator rx 1 = DPCA 0 = Control Treatment/randomization histol 1=lowest, 2, 3, 4=highest Severity of liver damage at dx bilirubin Continuous, mg/dl Serum bilirubin

Page 3: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 3 of 21

2. Prepare Data for Survival Analysis

options(scipen=1000) setwd("/Users/cbigelow/Desktop") load(file="pbc.Rdata") keepvars <- c("years","status","rx","histol", "bilirubin") temp <- pbc[keepvars] str(temp)

## 'data.frame': 312 obs. of 5 variables: ## $ years : num 1.1 12.32 2.77 5.27 4.12 ... ## $ status : int 1 0 1 1 0 1 0 1 1 1 ... ## $ rx : int 0 0 0 0 1 1 1 1 0 1 ... ## $ histol : int 4 3 4 4 3 3 3 3 2 4 ... ## $ bilirubin: num 14.5 1.1 1.4 1.8 3.4 ...

head(temp)

## years status rx histol bilirubin ## 1 1.095140 1 0 4 14.5 ## 2 12.320329 0 0 3 1.1 ## 3 2.770705 1 0 4 1.4 ## 4 5.270363 1 0 4 1.8 ## 5 4.117728 0 1 3 3.4 ## 6 6.852840 1 1 3 0.8

# SURVIVALOBJECTNAME <- with(DATAFRAMNAME, survival::Surv(TIMEVAR,CENSORVAR)) pbc.survival <- with(temp,survival::Surv(years,status))

Page 4: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 4 of 21

3. Model Free Approaches

a. Descriptives library(stargazer) library(gmodels)

cat(paste(" ", "Illustration of Survival Analysis (pbc.Rdata)","Preliminary Descriptives"," ",sep="\n")) ## ## Illustration of Survival Analysis (pbc.Rdata) ## Preliminary Descriptives ##

stargazer::stargazer(temp, type="text", median=TRUE) ## ## ================================================================== ## Statistic N Mean St. Dev. Min Pctl(25) Median Pctl(75) Max ## ------------------------------------------------------------------ ## years 312 5.493 3.075 0.112 3.261 5.036 7.385 12.474 ## status 312 0.401 0.491 0 0 0 1 1 ## rx 312 0.494 0.501 0 0 0 1 1 ## histol 312 3.032 0.878 1 2 3 4 4 ## bilirubin 312 3.256 4.530 0.300 0.800 1.350 3.425 28.000 ## ------------------------------------------------------------------

gmodels::CrossTable(temp$rx,temp$status,digits=2,prop.r=TRUE,prop.c=FALSE,prop.t=FALSE,prop.chisq=FALSE,dnn=c("Randomization/Treatment","Event")) ## Cell Contents ## |-------------------------| ## | N | ## | N / Row Total | ## |-------------------------| ## ## Total Observations in Table: 312 ## ## ## | Event ## Randomization/Treatment | 0 | 1 | Row Total | ## ------------------------|-----------|-----------|-----------| ## 0 | 93 | 65 | 158 | ## | 0.59 | 0.41 | 0.51 | ## ------------------------|-----------|-----------|-----------| ## 1 | 94 | 60 | 154 | ## | 0.61 | 0.39 | 0.49 | ## ------------------------|-----------|-----------|-----------| ## Column Total | 187 | 125 | 312 | ## ------------------------|-----------|-----------|-----------|

Interpretation: Among n=158 randomized to CONTROL, there were 65 deaths (41%) Among n=154 randomized to active treatment DPCA, there were 60 deaths (39%)

Page 5: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 5 of 21

b. Kaplan-Meier Curve Estimation library(survival) # Overall kmall <- survfit(formula = pbc.survival ~ 1, type="kaplan-meier", conf.type="log", conf.int=.95, data=temp) cat(paste(" ", "Kaplan-Meier Estimation - Overall"," ",sep="\n")) ## ## Kaplan-Meier Estimation - Overall ##

summary(kmall) ## Call: survfit(formula = pbc.survival ~ 1, data = temp, type = "kaplan-meier", ## conf.type = "log", conf.int = 0.95) ## ## time n.risk n.event survival std.err lower 95% CI upper 95% CI ## 0.112 312 1 0.997 0.00320 0.991 1.000 ## 0.140 311 1 0.994 0.00452 0.985 1.000 ## --------- Several rows omitted --------- ## 11.168 17 1 0.369 0.04895 0.285 0.479 ## 11.474 13 1 0.341 0.05278 0.251 0.461

# Separately by Randomization/Treatment kmrx <- survfit(formula = pbc.survival ~ rx, type="kaplan-meier", conf.type="log", conf.int=.95, data=temp) cat(paste(" ", "Kaplan-Meier Estimation - by RX"," ",sep="\n")) ## ## Kaplan-Meier Estimation - by RX ##

summary(kmrx) ## Call: survfit(formula = pbc.survival ~ rx, data = temp, type = "kaplan-meier", ## conf.type = "log", conf.int = 0.95) ## ## rx=0 ## time n.risk n.event survival std.err lower 95% CI upper 95% CI ## 0.112 158 1 0.994 0.00631 0.981 1.000 ## 0.194 157 1 0.987 0.00889 0.970 1.000 ## --------- Several rows omitted --------- ## 11.168 8 1 0.372 0.07247 0.254 0.545 ## 11.474 7 1 0.319 0.07922 0.196 0.519 ## ## rx=1 ## time n.risk n.event survival std.err lower 95% CI upper 95% CI ## 0.140 154 1 0.994 0.00647 0.981 1.000 ## 0.211 153 1 0.987 0.00912 0.969 1.000 ## --------- Several rows omitted --------- ## 10.511 13 1 0.394 0.06719 0.282 0.551 ## 10.549 12 1 0.361 0.06916 0.248 0.526

Page 6: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 6 of 21

c. Kaplan-Meier Curve Plot Overall # Method I - No frills p <- plot(kmall,xlab="Years on Study", ylab="% Surviving", yscale=100, main="I. Kaplan-Meier Survival (Overall)")

# Method II - Using plot() in base installation # Note: cex.main=0.75 and cex.lab=0.75 changes sizes of labels to smaller. plot(kmall, xlab="Years on Study", ylab="Percent Surviving (%)", yscale=100, cex.lab=0.75, main="II. DPCA Study of Primary Biliary Cirrhosis \nKaplan-Meier Survival (95% CI) - Overall", cex.main=0.75)

Page 7: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 7 of 21

# Method III - Using ggsurv() in package: GGally library(ggplot2) library(GGally) p <- ggsurv(kmall) p <- p + labs(x="Years on Study", y="Percent Surviving") p <- p + ggtitle("III. DPCA Study of Primary Biliary Cirrhosis \nKaplan-Meier Survival (95% CI) - Overall") p <- p + theme_bw() p

# Method IV - Using survplot() in package: rms. # Notes: (1) must obtain fit using npsurv( ) using package rms # (2) option n.risk=TRUE produces table of "at risk" library(rms) kmall2 <- rms::npsurv(pbc.survival~1) rms::survplot(kmall2, n.risk=TRUE, main="IV. DPCA Study of Primary Biliary Cirrhosis \nKaplan-Meier Survival (95% CI) - Overall", xlab="Years on Study", ylab="IV. Percent Surviving")

Page 8: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 8 of 21

By Randomization (rx) # Method I - No frills plot(kmrx,xlab="Years on Study", ylab="% Surviving", yscale=100,col=c("red","blue"), main="I. Kaplan-Meier Survival, by RX") legend("topright",title="RX",c("Control","DPCA"), fill=c("red","blue"))

# Method II - Using plot() in base installation plot(kmrx,xlab="Years on Study", main="II. DPCA Study of Primary Biliary Cirrhosis \nKaplan-Meier Survival (95% CI) - By Group", ylab="Percent Surviving",yscale=100,col=c("red","blue"), cex.main=0.75,cex.lab=0.75) legend("bottomleft",title="Treatment",c("Control","DPCA"),fill=c("red","blue"),cex=0.75,box.lty=0)

Page 9: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 9 of 21

# Method III - Using ggsurv() in package GGally library(ggplot2) library(GGally) p <- ggsurv(kmrx) p <- p + ggtitle("III. DPCA Study of Primary Biliary Cirrhosis \nKaplan-Meier Survival (95% CI) - by RX") p <- p + ggplot2::guides(linetype = FALSE) p <- p + ggplot2::scale_colour_discrete( name = 'rx', breaks = c(0,1), labels = c('Control', 'DPCA'))

p

Page 10: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 10 of 21

d. Log Rank Test of Equality of Survival Distributions library(survival) cat(paste(" ", "Log Rank Test of Equality of Survival Distributions - by RX"," "," ",sep="\n"))

## ## Log Rank Test of Equality of Survival Distributions - by RX ## ##

survival::survdiff(pbc.survival~rx,data=pbc)

## Call: ## survival::survdiff(formula = pbc.survival ~ rx, data = pbc) ## ## N Observed Expected (O-E)^2/E (O-E)^2/V ## rx=0 158 65 63.2 0.0502 0.102 ## rx=1 154 60 61.8 0.0513 0.102 ## ## Chisq= 0.1 on 1 degrees of freedom, p= 0.7

Interpretation: Do NOT reject. Assumption of the null hypothesis has NOT led to an unlikely result (p-value = .7). We have no statistically significant evidence that the survival distributions are not the same. e. Assessment of PH Using -log -log Plot # Transform Kaplan-Meier Estimates to log-log scale # When PH assumption holds, plot should be parallel lines # KEY: cex.main=1 (smaller size), font.main=1 (turns OFF bold), adj=0 (left aligns) kmrx <- survfit(formula = pbc.survival ~ rx, type="kaplan-meier", data=temp) plot(kmrx, fun = function (s) -log(-log(s)), main="-log(-log) Plot Assessment of PH in RX (Look for: Parallel Lines)", cex.main=1,font.main=1,adj=0, xlab = "Years", ylab = "-log(- log(Survival))", col = c("red","blue")) legend("topright",title="RX",c("Control","DPCA"),fill=c("red","blue"))

Page 11: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 11 of 21

4. Cox PH Model Regression Recall. The Cox PH model models the hazard of event (in this case death) at time “t” as the product of a baseline hazard times exp(linear model in the predictors X1, X2, …. Xp). Here, p=3 because we have 3 predictors of interest:

X1 = rx , 0/1 indicator of randomization X2 = histol, ordinal measure of degree of tissue damage at diagnosis X3 = bilirubin, continuous (mg/dl) Note. The predictor histol is an ordinal predictor. So, in R, we will need to declare it as a factor variable prior to modeling. a. Fit Cox PH Model

library(survival) # NOTE!!!!!!!!! Do not precede coxph with survival:: if you want to use stargazer later. library(stargazer) # Predictor = rx. 0/1 fit_rx <- coxph(pbc.survival~rx, data=temp) cat(paste(" ", "Fit of Single Predictor - rx"," ",sep="\n")) ## ## Fit of Single Predictor - rx ##

fit_rx ## Call: ## coxph(formula = pbc.survival ~ rx, data = temp) ## ## coef exp(coef) se(coef) z p ## rx -0.05722 0.94438 0.17916 -0.319 0.749 ## ## Likelihood ratio test=0.1 on 1 df, p=0.7494 ## n= 312, number of events= 125

Interpretation: Relative to control patients, patients treated with DPCA have lower hazard of death (HR = exp[coef] = .94) at all times of follow-up. This very small benefit is not statistically significant (p-value = .75).

h(t; X1,...Xp )=h0(t) exp[ β1X1+...+βpXp ]

Page 12: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 12 of 21

cat(paste(" ", "Full Summary of Fit of Single Predictor - rx"," ",sep="\n")) ## ## Full Summary of Fit of Single Predictor - rx ##

# Use command summary(MODELOBJECTNAME) to obtain more detail of fit summary(fit_rx) ## Call: ## coxph(formula = pbc.survival ~ rx, data = temp) ## ## n= 312, number of events= 125 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## rx -0.05722 0.94438 0.17916 -0.319 0.749 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## rx 0.9444 1.059 0.6647 1.342 ## ## Concordance= 0.499 (se = 0.025 ) ## Likelihood ratio test= 0.1 on 1 df, p=0.7 ## Wald test = 0.1 on 1 df, p=0.7 ## Score (logrank) test = 0.1 on 1 df, p=0.7

# Predictor = histol. Nominal - must declare as factor fit_histol <- coxph(pbc.survival~factor(histol), data=temp) cat(paste(" ", "Fit of Single Predictor - histol"," ",sep="\n")) ## ## Fit of Single Predictor - histol ##

fit_histol ## Call: ## coxph(formula = pbc.survival ~ factor(histol), data = temp) ## ## coef exp(coef) se(coef) z p ## factor(histol)2 1.607 4.988 1.031 1.559 0.1191 ## factor(histol)3 2.150 8.581 1.012 2.124 0.0337 ## factor(histol)4 3.063 21.387 1.009 3.036 0.0024 ## ## Likelihood ratio test=52.74 on 3 df, p=0.00000000002085 ## n= 312, number of events= 125

Interpretation: Recall. Higher score on histol (valid scores = 1, 2, 3, 4) represent greater level of liver tissue damage present at diagnosis. This model shows that higher (“worse”) values of histol at diagnosis are associated with poorer prognosis (Hazard ratio estimates increase from 1 to 4.98 to 8.58 to 21.4, relative to the referent group histol=1). This is highly statistically significant (p-value < < .0001).

Page 13: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 13 of 21

# Predictor = bilirubin. Continuous fit_bili <- coxph(pbc.survival~bilirubin, data=temp) cat(paste(" ", "Fit of Single Predictor - bilirubin"," ",sep="\n")) ## ## Fit of Single Predictor - bilirubin ##

fit_bili ## Call: ## coxph(formula = pbc.survival ~ bilirubin, data = temp) ## ## coef exp(coef) se(coef) z p ## bilirubin 0.14892 1.16058 0.01301 11.44 <0.0000000000000002 ## ## Likelihood ratio test=84.65 on 1 df, p=< 0.00000000000000022 ## n= 312, number of events= 125

Interpretation: Associated with each 1 unit (1 mg/dl) increase in bilirubin is an increased risk of death at all times of follow-up (HR = 1.16). This is highly statistically significant (p-value < < .0001).

# Display models side-by-side stargazer::stargazer(fit_rx, fit_histol,fit_bili, type="text", title="Single Predictor Logistic Regression Models", dep.var.labels=c("y=status (1=died)"), column.labels = c("rx", "histol", "bilirubin"))

## ## Single Predictor Logistic Regression Models ## ========================================================================== ## Dependent variable: ## ----------------------------------------------------- ## y=status (1=died) ## rx histol bilirubin ## (1) (2) (3) ## -------------------------------------------------------------------------- ## rx -0.057 ## (0.179) ## ## factor(histol)2 1.607 ## (1.031) ## ## factor(histol)3 2.150** ## (1.012) ## ## factor(histol)4 3.063*** ## (1.009) ## ## bilirubin 0.149*** ## (0.013) ## ## --------------------------------------------------------------------------

Page 14: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 14 of 21

## Observations 312 312 312 ## R2 0.0003 0.156 0.238 ## Max. Possible R2 0.983 0.983 0.983 ## Log Likelihood -639.915 -613.597 -597.641 ## Wald Test 0.100 (df = 1) 43.920*** (df = 3) 130.920*** (df = 1) ## LR Test 0.102 (df = 1) 52.738*** (df = 3) 84.651*** (df = 1) ## Score (Logrank) Test 0.102 (df = 1) 53.853*** (df = 3) 190.869*** (df = 1) ## ========================================================================== ## Note: *p<0.1; **p<0.05; ***p<0.01

b. Multivariable Model Development Focus of the analysis is on randomization/treatment library(survival) library(stargazer) # model 1: rx m1 <- coxph(pbc.survival~rx,data=temp) # model 2: histol + rx m2 <- coxph(pbc.survival ~ rx + factor(histol),data=temp) # model 3: bilirubin + rx m3 <- coxph(pbc.survival ~ rx + bilirubin, data=temp) # model 4: rx + histol + bilirubin m4 <- coxph(pbc.survival ~ rx + factor(histol) + bilirubin, data=temp) stargazer::stargazer(m1, m2, m3, m4, type="text", ci=TRUE, title="Multivariable Logistic Regression Models: beta (95% CI)", dep.var.labels=c("y=status (1=died)"))

## Multivariable Logistic Regression Models: beta (95% CI) ## =============================================================================================== ## Dependent variable: ## -------------------------------------------------------------------------- ## y=status (1=died) ## (1) (2) (3) (4) ## ----------------------------------------------------------------------------------------------- ## rx -0.057 -0.147 -0.201 -0.158 ## (-0.408, 0.294) (-0.500, 0.205) (-0.561, 0.158) (-0.514, 0.197) ## ## factor(histol)2 1.629 1.526 ## (-0.393, 3.650) (-0.497, 3.548) ## ## factor(histol)3 2.177** 1.923* ## (0.192, 4.162) (-0.067, 3.912) ## ## factor(histol)4 3.093*** 2.797*** ## (1.114, 5.072) (0.816, 4.778) ## ## bilirubin 0.151*** 0.148*** ## (0.125, 0.178) (0.120, 0.175) ## ## -----------------------------------------------------------------------------------------------

Page 15: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 15 of 21

## Observations 312 312 312 312 ## R2 0.0003 0.157 0.241 0.336 ## Max. Possible R2 0.983 0.983 0.983 0.983 ## Log Likelihood -639.915 -613.262 -597.038 -576.150 ## Wald Test 0.100 (df = 1) 44.530*** (df = 4) 130.940*** (df = 2) 148.440*** (df = 5) ## LR Test 0.102 (df = 1) 53.408*** (df = 4) 85.858*** (df = 2) 127.632*** (df = 5) ## Score (Logrank) Test 0.102 (df = 1) 54.478*** (df = 4) 191.230*** (df = 2) 217.136*** (df = 5) ## =============================================================================================== ## Note: *p<0.1; **p<0.05; ***p<0.01

stargazer::stargazer(m1, m2, m3, m4, type="text", report=('vc*p'), title="Multivariable Logistic Regression Models: p-values", dep.var.labels=c("y=status (1=died)"))

## Multivariable Logistic Regression Models: p-values ## ============================================================================================== ## Dependent variable: ## ------------------------------------------------------------------------- ## y=status (1=died) ## (1) (2) (3) (4) ## ---------------------------------------------------------------------------------------------- ## rx -0.057 -0.147 -0.201 -0.158 ## p = 0.750 p = 0.414 p = 0.273 p = 0.383 ## ## factor(histol)2 1.629 1.526 ## p = 0.115 p = 0.140 ## ## factor(histol)3 2.177** 1.923* ## p = 0.032 p = 0.059 ## ## factor(histol)4 3.093*** 2.797*** ## p = 0.003 p = 0.006 ## ## bilirubin 0.151*** 0.148*** ## p = 0.000 p = 0.000 ## ## ---------------------------------------------------------------------------------------------- ## Observations 312 312 312 312 ## R2 0.0003 0.157 0.241 0.336 ## Max. Possible R2 0.983 0.983 0.983 0.983 ## Log Likelihood -639.915 -613.262 -597.038 -576.150 ## Wald Test 0.100 (df = 1) 44.530*** (df = 4) 130.940*** (df = 2) 148.440*** (df = 5) ## LR Test 0.102 (df = 1) 53.408*** (df = 4) 85.858*** (df = 2) 127.632*** (df = 5) ## Score (Logrank) Test 0.102 (df = 1) 54.478*** (df = 4) 191.230*** (df = 2) 217.136*** (df = 5) ## ============================================================================================== ## Note: *p<0.1; **p<0.05; ***p<0.01

Page 16: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 16 of 21

# LR Tests of crude and adjusted significance of rx library(lmtest)

# rx controlling for: histol histol <- coxph(pbc.survival ~ factor(histol),data=temp) histolrx <- coxph(pbc.survival ~ rx + factor(histol),data=temp) # rx controlling for: bilirubin bili <- coxph(pbc.survival ~ bilirubin,data=temp) bilirx <- coxph(pbc.survival ~ rx + bilirubin,data=temp) # rx controlling for: histol AND bilirubin histolbili <- coxph(pbc.survival ~ factor(histol) + bilirubin,data=temp) histolbilirx <- coxph(pbc.survival ~ rx + factor(histol) + bilirubin,data=temp) cat(paste(" ", "RX controlling for HISTOL"," ",sep="\n")) ## ## RX controlling for HISTOL ## lmtest::lrtest(histol,histolrx) ## Likelihood ratio test ## ## Model 1: pbc.survival ~ factor(histol) ## Model 2: pbc.survival ~ rx + factor(histol) ## #Df LogLik Df Chisq Pr(>Chisq) ## 1 3 -613.60 ## 2 4 -613.26 1 0.6697 0.4132 Interpretation: Do not reject. After adjustment for histol, randomization to DPCA is NOT associated with survival (LR Test p-value = .41)

cat(paste(" ", "RX controlling for BILIRUBIN"," ",sep="\n")) ## ## RX controlling for BILIRUBIN ## lmtest::lrtest(bili,bilirx) ## Likelihood ratio test ## ## Model 1: pbc.survival ~ bilirubin ## Model 2: pbc.survival ~ rx + bilirubin ## #Df LogLik Df Chisq Pr(>Chisq) ## 1 1 -597.64 ## 2 2 -597.04 1 1.2065 0.272 Interpretation: Do not reject. After adjustment for bilirubin, randomization to DPCA is NOT associated with survival (LR Test p-value = .27)

cat(paste(" ", "RX controlling for HISTOL + BILIRUBIN"," ",sep="\n")) ## ## RX controlling for HISTOL + BILIRUBIN ## lmtest::lrtest(histolbili,histolbilirx) ## Likelihood ratio test ## ## Model 1: pbc.survival ~ factor(histol) + bilirubin ## Model 2: pbc.survival ~ rx + factor(histol) + bilirubin ## #Df LogLik Df Chisq Pr(>Chisq) ## 1 4 -576.53 ## 2 5 -576.15 1 0.7634 0.3823 Interpretation: Do not reject. After adjustment for both histol and bilirubin, randomization to DPCA is NOT associated with survival (LR Test p-value = .38)

Page 17: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 17 of 21

5. Regression Diagnostics for Cox PH Model

a. Assessment of Proportional Hazards Hypothesis Test of NULL: PH

Goal. For rare events, a hazard ratio may be interpreted as a relative risk. The assumption of proportional hazards says that the hazard ratio, comparing “exposed” versus “not exposed” (or the effect of a unit increase in exposure), is constant over time. Test. The test of the assumption of proportional hazards has NULL: proportional hazards assumption is true. Thus, we hope to retain the null! library(survival) # Test NULL: Proportional Hazards Assumption Holds cat(paste(" ", "Test of Proportional Hazards Assumption"," ",sep="\n")) ## ## Test of Proportional Hazards Assumption ##

survival::cox.zph(m4,transform="km", global=TRUE) ## rho chisq p ## rx -0.0932 1.0539 0.3046 ## factor(histol)2 0.0255 0.0813 0.7755 ## factor(histol)3 0.0110 0.0150 0.9026 ## factor(histol)4 -0.0402 0.2010 0.6539 ## bilirubin 0.1091 1.1172 0.2905 ## GLOBAL NA 13.7628 0.0172

Interpretation: The global test is significant (p-value = .0172) … but for each predictor, do not reject the assumption of proportional hazards Graphical Assessment of NULL: PH Graphical Assessments. When the assumption of proportional hazards is true, you should see the following:

Plot of Y=survival function versus X=Time over Groups=Exposure (1=yes, 0=no) Look for: parallel curves Plot of Y=ln(-ln(survival function)) versus X=ln(Time) over Groups=Exposure (1=yes, 0=no) Look for: parallel lines Plot of Y=scaled Schoenfeld residuals associated with predictor versus X=Time For each predictor, an enhanced model that allows for the inclusion of time dependency is considered. The assumption of proportional hazards will be indicated by a NON-zero slope for the added time-dependent predictor. From this model, one obtains the Schoenfeld residuals which are then plotted. Look for: Even band of scaled residuals centered at 0 on the y-axis

Page 18: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 18 of 21

# Plot Scaled Schoenfeld Residuals v Time (All is well if: even band centered at 0) # [1]=1st predictor (rx), [2]=2nd predictor (histol=2), etc to match order in model m4 # Graphical Assessment is in 2 steps: 1) test of PH assumption; 2) plots # Step 1 – Test of PH assumption test.ph <-cox.zph(m4,transform="km",global=TRUE)

# Step 2 – Plot for rx plot(test.ph[1],main="Randomization (rx)"); abline(h=0)

# Step 2 – Plot for histol=2 plot(test.ph[2],main="Histology = 2"); abline(h=0)

Page 19: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 19 of 21

# Step 2 – Plot for histol=3 plot(test.ph[3],main="Histology = 3"); abline(h=0)

# Step 2 – Plot for histol=4 plot(test.ph[4],main="Histology = 4"); abline(h=0)

Page 20: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 20 of 21

# Step 2 – Plot for bilirubin plot(test.ph[5],main="Bilirubin"); abline(h=0)

b. Assessment of GOF: Kaplan-Meier v Cox PH library(survival) # OBSERVED = model free Kaplan-Meier estimates of survival, separately by RX observed <- survfit(pbc.survival~rx, data=temp) plot(observed, conf.int=FALSE)

Page 21: R for Survival Analysis 2020people.umass.edu/biep640w/pdf/R for Survival Analysis 2020.pdfSurvival Analysis R Illustration ….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx

BIOSTATS 640 – Spring 2020 8. Survival Analysis R Illustration

….R\00. R Handouts 2019-20\R for Survival Analysis 2020.docx Page 21 of 21

# FITTED will be Cox PH model "predicted" estimates of survival using final model (m4) m4 <- coxph(pbc.survival ~ rx + factor(histol) + bilirubin, data=temp) # Create prediction data frame (nd) with rx=1 or rx=0 # Hold factor variable histol at level=3 (any choice would be fine) # Hold bilirubin at its mean nd <- temp nd$histol <- 3 nd$bilirubin <- mean(nd$bilirubin) # Add the cox model "predicted" estimates of survival to the new data fitted <- survfit(m4, newdata = nd) # Plot COX PH fitted plot(fitted, lty=c(1,2), xlab="Years on Study", ylab="Percent Surviving (%)", yscale=100, cex.lab=0.75, main = "DPCA Study of Primary Biliary Cirrhosis \nComparison Kaplan-Meier (red) v Cox PH (black)", cex.main=0.75) # Using command lines( ) overlay KM lines(observed, col="red", lty=c(1,2))