Lecture 12: Cox Proportional Hazards Model Introduction.

Lecture 12: Cox Proportional Hazards Model

Introduction

Cox Proportional Hazards Model

• Names– Cox regression– Semi-parametric proportional hazards– Proportional hazards model– Multiplicative hazards model

• When?– 1972

• Why?– Allows for adjustment of covariates (continuous and categorical) in a

survival setting– Allows prediction of survival based on a set of covariates

• Analogous to linear and logistic regression in many ways

Cox PHM Notation (K & M)

• Data on n individuals:– Tj : time on study for individual j

– dj : event time indicator for individual j

– Zj : vector of covariates for individual j

• More complicated: Zj(t)– Covariates are time dependent– May change with time/age

Basic Model

0 1exp

h t h t c

Comments on the Basic Model

• h0(t):– Arbitrary baseline hazard– Notice that it varies by t

• b:– Regression coefficient vector– Interpretation is a log hazard ratio

• Semi-parametric form– Non-parametric baseline hazard– Parametric form assumed for covariate effects only

Linear Model Formulation

• Usual formulation

• Coding of covariates similar to linear and logistic (and other generalized linear models)

Refresher of Coding Covariates

• Should be nothing new• Two kinds of “independent” variables

– Quantitative– Qualitative

• Quantitative are continuous– Need to determine scale

• Units• Transformations?

• Qualitative are generally categorical– Ordered– Nominal– Coding affects interpretation

Why Proportional

• Hazard ratio• Does not depend on t (i.e. it is constant over time)• But, it is proportional (constant multiplicative factor)• Also referred to (sometimes) as the relative risk

Simple Example

• One covariate:

• Hazard ratio:

• Interpretation: exp{btrt}is the risk of having the event in the new treatment group vs. the standard treatment group

• Interpretation: At any point in time, the risk of the event in the new treatment group is exp{btrt} time the risk in the standard treatment group

1 new treatment

0 standard treatment

Fig 3.

Cantù M G et al. JCO 2002;20:1232-1237

Hazard Ratios

• Assumption: “proportional hazards”• The risk does not depend on time• That is “the risk is constant over time”• But that is still vague…

• Hypothetical example: Assume the hazard ratio is 0.5– Patient in new therapy group are at half the risk of death as

those in the standard treatment, at any given point in time

• Hazard function = P(die at time t| survived to time t)

Hazard Ratios

• Hazard ratio =

• Makes assumption that this ratio is constant over time

hazard function New

hazard function Std

Interpretation Again

• For any fixed point in time, individuals in the new treatment group have half the risk of death as the standard treatment group.

A Slightly More Complicated Example

• What if we had 2 binary covariates?• How is the hazard ratio estimated in this case?• What about the proportional hazards

assumption?

• Consider a model that includes

• Our model looks like:

• From this we can estimate 4 possible hazard rates

• And if we “compare” the different hazards by taking the ratio we get

A Slightly More Complicated Example• But what does this mean in terms of

proportional hazards?

Hazard ratio is not always valid…

Hazard ratio = 0.71

Let’s Think About the Likelihood…

Partial Likelihood

• The partial likelihood is defined as

• Where– j = 1, 2, …, n– No ties– t1 < t2 < … < tD

– Z(i)k is the kth covariate associated with the individual whose failure time is ti

– R(ti) =Yi is the risk set at time ti

pD k ikk

k ikj R t k

Things to Notice

• Numerator only depends on information from a patient who experiences the event

• The denominator incorporates information across all patients in the risk set

Constructing the Likelihood

• Without Censoring…• Say we have the following data on n = 5

subjects– Observed times and even indicators:

• ti = 11, 12, 14, 16, 21

• di = 1, 1, 1, 1, 1

– And a single binary covariate• zi = 0, 1, 0, 1, 1

Constructing the Likelihood• First let’s construct our risk set for each

unique time

• Now, we can construct our likelihood…

• But what if we have censoring?• Consider the revised data:

– Observed times and even indicators:• ti = 11, 12, 14, 16, 21

• di = 1, 1, 0, 1, 0

– And a single binary covariate• zi = 0, 1, 0, 1, 1

• Again let’s construct our risk set for each unique time

• And again we can construct our likelihood…

Estimation

• The log-likelihood

• Maximize log-likelihood to solve for estimates of b

Estimation

• Maximize log-likelihood to solve for estimates of b

• Score equations and information matrices are found using standard approaches

• Solving for estimates can be done numerically (e.g. Newton-Raphson)

Tests of the Model

• Testing that bk = 0 for all k = 1, 2, …, p

• Three main tests– Chi-square/ Wald test– Likelihood ratio test– Score test

• All three have chi-square distribution with p degrees of freedom

Example: CGD

• Study examining the impact of gamma interferon treatment on infection in people with chronic granulotomous disease

• 203 subject– Main variable of interest is treatment

• Placebo• Gamma interferon

– Other variables • Demographics (age, height, weight)• Steroid use• Pattern of inheritance• Treatment center …

• Outcome: Time to first major infection

Cox PHM Approach> data(cgd)> st<-Surv(cgd$time, cgd$infect)> reg1<-coxph(st~cgd$treat)> reg1Call:coxph(formula = st ~ cgd$treat)

coef exp(coef) se(coef) z pcgd$treatrIFN-g -1.09 0.337 0.268 -4.06 4.9e-05

Likelihood ratio test=18.9 on 1 df, p=1.36e-05 n= 203, number of events= 76

> attributes(reg1)$names [1] "coefficients" "var" "loglik" "score" "iter" "linear.predictors" [7] "residuals" "means" "concordance" "method" "n" "nevent" [13] "terms" "assign" "wald.test" "y" "formula" "xlevels" "contrasts" "call" $class[1] "coxph"

Results> summary(reg1)Call:coxph(formula = st ~ cgd$treat)

n= 203, number of events= 76

coef exp(coef) se(coef) z Pr(>|z|) cgd$treatrIFN-g -1.0864 0.3374 0.2677 -4.059 4.93e-05 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

exp(coef) exp(-coef) lower .95 upper .95cgd$treatrIFN-g 0.3374 2.964 0.1997 0.5702

Likelihood ratio test= 18.92 on 1 df, p=1.364e-05Wald test = 16.47 on 1 df, p=4.933e-05Score (logrank) test = 18.07 on 1 df, p=2.124e-05

Fitting More Covariates in R> reg2<-coxph(st~treat+steroids+inherit+hos.cat+sex+age+weight, data=cgd)> reg2Call:coxph(formula = st ~ treat + steroids + inherit + hos.cat + sex + age + weight, data = cgd) coef exp(coef) se(coef) z ptreatrIFN-g -1.2025 0.300 0.2828 -4.253 2.1e-05steroids 1.7743 5.896 0.5852 3.032 2.4e-03inheritautosomal 0.6169 1.853 0.2824 2.184 2.9e-02hos.catUS:other 0.0589 1.061 0.3208 0.184 8.5e-01hos.catEurope:Amsterdam -0.5687 0.566 0.4432 -1.283 2.0e-01hos.catEurope:other -0.6232 0.536 0.4956 -1.257 2.1e-01sexfemale -0.6193 0.538 0.3872 -1.600 1.1e-01age -0.0861 0.917 0.0336 -2.566 1.0e-02weight 0.0235 1.024 0.0127 1.858 6.3e-02

Likelihood ratio test=41.2 on 9 df, p=4.65e-06 n= 203, number of events= 76

Next Time

More on constructing our hypothesis tests next time…

Lecture 12: Cox Proportional Hazards Model Introduction.

Documents

7 Cox Proportional Hazards Regression Models - NCSU...

Deep Survival: A Deep Cox Proportional Hazards NetworkDeep.....

MODEL REGRESI COX PROPORTIONAL ... -...

COX PROPORTIONAL HAZARDS MODEL AND ITS … · COX...

COX PROPORTIONAL HAZARDS MODEL AND ITS...

Cox Proportional-Hazards Regression for Survival Data in...

Comparison Between Weibull and Cox Proportional Hazards...

Writing about Proportional Hazards Analysis

COX ‘s Proportional Hazard Model Cox 比例风险模型

Chapter 3 The Cox Proportional Hazards Model · 2017. 12......

Proportional Hazard Regression Cox Proportional Hazards...

ANALISIS REGRESI COX PROPORTIONAL HAZARDS PADA …

cox proportional hazards model and its characteristics

Analysis of Complex Survey Data Day 4: Survival analysis and...

Competing risk and the Cox proportional hazard

The Cox Proportional Hazards Model