Top Banner
2/20/2012 1 Brook I. Martin PhD MPH Dartmouth College Introduction to Survival Analysis About me Health services PhD from University of h Washington. Health services faculty at Dartmouth College. Affiliated with DartmouthHitchcock Medical Center Department of Orthopaedics. Primary research interest is in the quality of care for musculoskeletal & spinal problems. 2 Overall goals Review framework for choosing an analysis 1) Introduction to survival analysis. 2) Descriptive analysis of survival data. 3) Applied introduction to CoxProportional Hazard regression models. 1) Modeling an outcome 2) Diagnostic tests of PH assumptions 3
19

10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

Jul 06, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

2/20/2012

1

Brook I. Martin PhD MPHDartmouth Collegeg

Introduction to Survival Analysis

About me

• Health services PhD from University of hWashington.

• Health services faculty at Dartmouth College.

• Affiliated with Dartmouth‐Hitchcock Medical Center Department of Orthopaedics.

• Primary research interest is in the quality of care for musculoskeletal & spinal problems.

2

Overall goals

Review framework for choosing an analysis

1) Introduction to survival analysis.

2) Descriptive analysis of survival data.

3) Applied introduction to Cox‐Proportional Hazard regression models.

1) Modeling an outcome

2) Diagnostic tests of PH assumptions 

3

Page 2: 10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

2/20/2012

2

Data types

Type Example

Continuous  Age; annual salary; WBC

Dichotomous Infection (yes/no);  Death (yes/no)

Count Number (rates) of surgical procedures.

Survival # days until death or other event.

Categorical (Ordinal)

Health rating:1 = Excellent; 2 = Very good; 3 = Good; 4 = Fair; 5 = Poor

Categorical (Nominal)

Insurance:1 = Medicare/aid; 2 = Private; 4 = HMO; 5 = Other

4

Data types

Type Example

Continuous  Age; annual salary; WBC

Dichotomous Infection (yes/no);  Death (yes/no)

Count Number (rates) of surgical procedures.

Survival # days until death or other event.

Categorical (Ordinal)

Health rating:1 = Excellent; 2 = Very good; 3 = Good; 4 = Fair; 5 = Poor

Categorical (Nominal)

Insurance:1 = Medicare/aid; 2 = Private; 4 = HMO; 5 = Other

5

Analysis frameworkDependentVariable

Univariable analysis (e.g. descriptive)

Bivariable analysis Multivariableanalysis

Continuous  MeansT‐test

Correlation;T‐test

Analysis of covariance

Dichotomous Proportion Chi‐square Logistic Regression

Count Incidence &  Rate‐difference or ratio Poisson regressionprevalence

g

Survival Kaplan‐Meier survival Log‐rank; Wilcoxon(other) for survival data

Cox‐proportionalhazard regression

Categorical (Ordinal)

Proportion;Wilcoxon signed rank

Spearman’s test;Mann‐Whitney test;

Ordered logistic regression

Categorical (Nominal)

Proportion Chi‐square; Mantel‐Haenszel test

Multinomial logistics

6

Page 3: 10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

2/20/2012

3

Analysis frameworkDependentVariable

Univariable analysis (e.g. descriptive)

Bivariable analysis Multivariableanalysis

Continuous  MeansT‐test

Correlation;T‐test

Analysis of covariance

Dichotomous Proportion Chi‐square Logistic Regression

Count Incidence &  Rate‐difference or ratio Poisson regressionprevalence

g

Survival Kaplan‐Meier survival Log‐rank; Wilcoxon(other) for survival data

Cox‐proportionalhazard regression

Categorical (Ordinal)

Proportion;Wilcoxon signed rank

Spearman’s test;Mann‐Whitney test;

Ordered logistic regression

Categorical (Nominal)

Proportion Chi‐square; Mantel‐Haenszel test

Multinomial logistics

7

Examples

Martin BI, Mirza SK, Comstock BA, Gray DT, Kreuter W, Deyo RA. Reoperation rates following lumbar spine surgery and the influence of spinal fusion procedures. Spine (Phila Pa 1976). 2007 Feb 1;32(3):382‐7.

Bederman SS, Kreder HJ, Weller I, Finkelstein JA, Ford MH, Yee AJ. The who, what and when of surgery for the degenerative lumbar spine: a population‐based study of surgeon factors, surgical procedures, recent trends and reoperation rates. Can J Surg. 2009 Aug;52(4):283‐290.

8

Part I: Survival Analysis

1) Introduction to survival analysis

A. What is survival analysis

B. Goals of survival analysis

C Survival time and censored dataC. Survival time and censored data

D. Calculating survival 

E. Why use survival analysis

9

Page 4: 10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

2/20/2012

4

What is survival analysis?

• Statistical procedures used when the outcome of interest is time until an event occurs.

• Example: Time until death, Time until repeat surgerysurgery.

• Definitions: 

Time = survival time (days/years)

Event = indication of failure

Censoring = when survival data is not fully known.

10

Goals of survival analysis?

• Estimate and interpret survivor and/or hazard functions.

• To compare survivor and/or hazard functions between groups of interestbetween groups of interest.

• To assess the relationship of explanatory variables to survival time ( Cox‐Proportional Hazard regression)

11

Survival data:

A

B

C

subjects

Time Event Censor

No 12 0 1

12

Time (weeks)

0 2 4 6 8 10 12

D

EStudy s

Page 5: 10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

2/20/2012

5

Survival data:

A

B

C

subjects

Time Event Censor

YES

No 12 0 1

8 1 0

13

Time (weeks)

0 2 4 6 8 10 12

D

EStudy s

Survival data:

A

B

C

subjects

Time Event Censor

YES

No

No

12 0 1

8 1 0

8 0 1

14

Time (weeks)

0 2 4 6 8 10 12

D

EStudy s

Survival data:

A

B

C

subjects

Time Event Censor

YES

No

No

12 0 1

8 1 0

8 0 1

15

Time (weeks)

0 2 4 6 8 10 12

D

EStudy s No 4 0 1

Page 6: 10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

2/20/2012

6

Survival data:

A

B

C

subjects

Time Event Censor

YES

No

No

12 0 1

8 1 0

8 0 1

16

Time (weeks)

0 2 4 6 8 10 12

D

EStudy s No

Yes

4 0 1

6 1 0

Survival data:

Time Event Censor

A 12 0 1

B 8 1 0

C 8 0 1

ID δ = (0,1)  Where  1 if failure0 if censored

T = random variable for survival time (>=0)t = specific value for T

17

D 4 0 1

E 6 1 0

Survival > 8 weeks?H0: T > t = 8

Survival data:

Time Event Censor

A 12 0 1

B 8 1 0

C 8 0 1

ID δ = (0,1)  Where  1 if failure0 if censored

T = random variable for survival time (>=0)t = specific value for T

18

D 4 0 1

E 6 1 0

Survival > 8 weeks?H0: T > t = 8

n tn δn

Page 7: 10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

2/20/2012

7

Survivor function:

Survivor function S(t): The probability thata subject survives longer than a specific t

• Always decrease as t increases

S(t) 

S(t) = 11 

19

Always decrease as t increases• At t= 0; S(t) = 1• At t = ∞; S(t) = 0• In practice represented by a step curve.

S(∞) = 00 

Survivor function:

Survivor function S(t) – The probability thata subject survives longer than a specific t

• Always decrease as t increases

S(t) 

S(t) = 11 

20

Always decrease as t increases• At t= 0; S(t) = 1• At t = ∞; S(t) = 0• In practice represented by a step curve.

S(∞) = 00 

S(t) 

S(t) = 1

t

Hazard function:

The limit, as Δt approaches 0, of a probability statement about survival, divided by Δt

h(t) = lim P( t ≤ T <  Δt | T ≥ t)/ Δt

h(t) ≥ 0; has no upper limit.h(t)

21

View the hazard as an “instantaneous” rateof the event per a unit of time, given that the subject has survived to time t.

S(t) : not failingH(t) : failing

h(t) 

Page 8: 10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

2/20/2012

8

Calculating survival:

ID T δ cen

A 12 0 1

B 8 1 0

C 8 0 1

D 4 0 1

ID T δ cen

‐‐ 0 0 0

D 4 0 1

J 4 1 0

K 5 0 1

t(j) #fail

# cen # at risk

t(0) 0 0 11

Table of ordered failure times

22

D 4 0 1

E 6 1 0

F 10 1 0

G 10 0 1

H 9 0 1

I 8 1 0

J 4 1 0

K 5 0 1

E 6 1 0

B 8 1 0

C 8 0 1

I 8 1 0

H 9 0 1

F 10 1 0

G 10 0 1

A 12 0 1

t(4) 1 2 11

t(6) 1 0 8

t(8) 2 2 7

t(10) 1 2 3

Calculating survival:

ID T δ Cen

‐‐ 0 0 0

D 4 0 1

J 4 1 0

K 5 0 1

t(j) #fail

# cen # at risk

t(0) 0 0 11

t(4) 1 2 11

Table of ordered failure timesAverage survival time (ignoring censoring) =  84/ 11 = 7.6 weeks 

Average Hazard time  = # fail / ∑ ti = 5/ 84 = 0.06

23

E 6 1 0

B 8 1 0

C 8 0 1

I 8 1 0

H 9 0 1

F 10 1 0

G 10 0 1

A 12 0 1

t(6) 1 0 8

t(8) 2 2 7

t(10) 1 2 3

Calculating survival:

T bl f d d f il ti

General Kaplan‐Meier survival formula:

S(t(j)) = S(t(j‐1)) * P(T >t(j) | T ≥ t(j))

24

t(j) #fail

# cen # at risk

S(t)

t(0) 0 0 11 1

t(4) 1 2 11 0.91

t(6) 1 0 8 0.80

t(8) 2 2 7 0.57

t(10) 1 2 3 0.37

Table of ordered failure times

(1.00 * 10/11)

(0.91 * 7/8)

(0.80 * 5/7)

(0.57 * 2/3)

Page 9: 10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

2/20/2012

9

Part 2: Describing survival data

Examples in Stata using:stsetstdescribestlist

25

stliststgraphsttest

Analysis frameworkDependentVariable

Univariable analysis (e.g. descriptive)

Bivariable analysis Multivariableanalysis

Continuous  MeansT‐test

Correlation;T‐test

Analysis of covariance

Dichotomous Proportion Chi‐square Logistic Regression

Count Incidence &  Rate‐difference or ratio Poisson regressionprevalence

g

Survival Kaplan‐Meier survival Log‐rank; Wilcoxon(other) for survival data

Cox‐proportionalhazard regression

Categorical (Ordinal)

Proportion;Wilcoxon signed rank

Spearman’s test;Mann‐Whitney test;

Ordered logistic regression

Categorical (Nominal)

Proportion Chi‐square; Mantel‐Haenszel test

Multinomial logistics

26

ExampleData source: Washington State Comprehensive Hospital Abstract 

Reporting System (CHARS).

Design: Retrospective analysis of administrative data.

Patients: Patients who underwent a lumbar herniated disc surgery.

Primary outcome: Time until the first instance of a repeat surgery (“reoperation”)

Independent variables:  Age, sex, comorbidity, previous surgery, Insurance.

Citation:  Martin BI, 2007. Spine.

27

Page 10: 10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

2/20/2012

10

Analysis using survival data

First, use stset, to tell Stata that you are working with survival data.

stset reoper, failure(resurgmax)

28

Name of time variable Name of event indicator variable

Univariate analysis

Describe basic survival data

stdescribe

29

Analysis using survival data

Use sts list, to list survival data.

stset list, by(PAYGRP) compare

30

Page 11: 10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

2/20/2012

11

Analysis using survival data

Use sts graph, to examine Kaplan‐Meier survival curve.

sts graph

31

Analysis using survival data

sts graph, by(PAYGRP) risktablexlable(0(2)12) scheme(s2mono)

32

Analysis using survival data

sts graph, by(PAYGRP) xlabel(0(2)12) scheme(s2mono) failure

33

Page 12: 10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

2/20/2012

12

Analysis using survival data

sts graph, by(PAYGRP) xlabel(0(2)12) scheme(s2mono) hazard

34

Bivariate testing

Test for a difference between insurance groups using log‐rank test:

sts test PAYGRP

35

Part 3: Cox Proportional Hazard Regression Analysis

• Multivariable regression for time‐to‐event data.

• Testing PH‐model assumption

36

Page 13: 10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

2/20/2012

13

Analysis frameworkDependentVariable

Univariable analysis (e.g. descriptive)

Bivariable analysis Multivariableanalysis

Continuous  MeansT‐test

Correlation;T‐test

Analysis of covariance

Dichotomous Proportion Chi‐square Logistic Regression

Count Incidence &  Rate‐difference or ratio Poisson regressionprevalence

g

Survival Kaplan‐Meier survival Log‐rank; Wilcoxon(other) for survival data

Cox‐proportionalhazard regression

Categorical (Ordinal)

Proportion;Wilcoxon signed rank

Spearman’s test;Mann‐Whitney test;

Ordered logistic regression

Categorical (Nominal)

Proportion Chi‐square; Mantel‐Haenszel test

Multinomial logistics

37

Cox Proportional Hazard Model:

General form

h(t, X) = h0(t) exp[β1X1 + … + βnXn]

38

Hazard Ratio:The measure of effect used in survival analysis that describes the association between an exposure and an outcome.

Calculated as the hazard for one set of characteristic (exposure)  compared to an alternative set of characteristics (unexposed).

HR = ĥ(t, X*)/ ĥ(t, X) = exp[∑ βi(X*I – Xi)]

39

( , )/ ( , ) p[∑ βi( I i)]

Typically expressed as an exponential from a regression coefficient:HR = 1 no associationHR > 1 exposure associated with greater risk of outcomeHR <1 exposure associated with lower risk of outcome.

Page 14: 10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

2/20/2012

14

Cox‐Proportional  Hazard Regression

Stata command for Cox‐PH Regression:

stcox i.PAYGRP female i.AGE4 positive_quan

40

PH assumptionThe requirement for the HR to constant over time…

…or,  that the hazard for one exposure is proportional (not  dependent on time) to the hazard for  an alternative exposure. 

41

ĥ(t, X*) = constant * ĥ(t, X)

Can be evaluated •Graphically•Through GOF statistics

PH assumptionThe requirement for the HR to constant over time…

…or,  that the hazard for one exposure is proportional (not  dependent on time) to the hazard for  an alternative exposure. 

h(t, X) 

X = 1

X = 0

42

ĥ(t, X*) = constant * ĥ(t, X)

Can be evaluated •Graphically•Through GOF statistics

0 4 8 12weeks 

Page 15: 10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

2/20/2012

15

PH assumptionThe requirement for the HR to constant over time…

…or,  that the hazard for one exposure is proportional (not  dependent on time) to the hazard for  an alternative exposure. 

h(t, X) 

X = 1

X = 0

43

ĥ(t, X*) = constant * ĥ(t, X)

Can be evaluated •Graphically•Through GOF statistics

0 4 8 12weeks 

HR 2 weeks:   ĥ(t=2, X = 1)/ ĥ(t=2, X = 0) < 1

HR 10 weeks:  ĥ(t=10, X = 1)/ ĥ(t=10, X = 0) > 1

Thus, HR is not constant over time, and the PH assumption is not met!   & Cox‐PH regression is not appropriate.

Graphically examine PH assumption

Log‐Log survival curves: stphplot, by(female) adjust(positive _quan PAYGRP AGE4)

44

Graphically examine PH assumption

Observed (K‐M) versus expected survival (Cox PH)  plot: stcoxkm, by(PAYGRP)

45

Page 16: 10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

2/20/2012

16

PH assumptionAnalysis of Shoefeld’sresiduals (fits a smooth function of time to the residuals )

estat phtest

46

estat phtest, detail

estat phtest, plot(age)

PH assumptionAnalysis of Shoefeld’sresiduals (fits a smooth function of time to the residuals )

estat phtest

47

estat phtest, detail

estat phtest, plot(age)

PH assumptionAnalysis of Shoefeld’sresiduals (fits a smooth function of time to the residuals )

estat phtest

48

estat phtest, detail

estat phtest, plot(age)

Page 17: 10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

2/20/2012

17

When PH‐assumption is not met

• Separately model for groups of interest• Stratified Cox‐PH regression.• Include Time‐varying covariates.

49

Separate models for each strata

Report separate survival models for groups of intereststcox female i.AGE4 positive_quan if PAYGRP == 0

stcox female i.AGE4 positive_quan if PAYGRP == 1

50

PAYGRP == 1

stcox female i.AGE4 positive_quan if PAYGRP == 2

Stratified Cox‐PH modelSeparately report survival for groups of interest using Stratified Cox‐PH regression.For age group = 0:  h(t, X) = h00(t) exp[β1X1 + … + βnXn ]For age group = 1:  h(t, X) = h01(t) exp[β1X1 + … + βnXn ]For age group = 2:  h(t, X) = h02(t) exp[β1X1 + … + βnXn ]For age group = 3:  h(t, X) = h03(t) exp[β1X1 + … + βnXn ]

stcox female i.AGE4 positive_quan , strata(PAYGRP)

51

Coefficients are for variables that meet PH assumption, adjusted for the variables that they are stratified on; not possible to obtain HR for the effect on PAYGRP

Page 18: 10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

2/20/2012

18

Use time‐varying coefficientsOriginal model assumes no interaction with time:h(t, X) = h0(t) exp[β1X1 + … + βnXn]

Time‐varying model allows for hazard to vary with time:h*(t, X) = h0(t) exp[β1X1+ …+ βnXn + δjXj(t)]

stcox PAYGRP female i.AGE4 positive_quan , tvc(AGE4) texp(year5)

52

Split data by timestset time, failure(resurgmax) id(id)stsplit cat, at(4 , 8)gen year4 = PAYGRP*(cat==4)gen year8 = PAYGRP*(cat==8)stcox PAYGRP female i.AGE4 positive_quan year4 year8 

53

Special situation

• Known survival distribution– Use parametric survival analysis (“streg”)

• Correlated data– Termed “shared frailty” in survival modelsUse “shared” option– Use “shared” option

• Survey data– Use “svyset”, followed by “svycox” 

• Competing risks: when multiple causes of failure may occur.– Use “stcrreg”

54

Page 19: 10CER Module10 SurvivalAnalysis Martin€¦ · Hazard regression models. 1) Modeling an outcome 2) ... B. Goals of survival analysis C. Survival time and censored data D. Calculating

2/20/2012

19

Summary

• Survival analysis is used to examine risk of an event over time.

• Demonstrated how to calculate the survival function and the hazard functionsurvival function and the hazard function.

• Non‐parametric methods for evaluating and describing survival data.

• Performing Cox‐PH requires an detailed evaluation of the PH assumption.

55

Thanks! 

Brook I Martin@Dartmouth [email protected]

56