Top Banner
Introduction to Introduction to Survival Analysis Survival Analysis Utah State University Utah State University January 28, 2008 January 28, 2008 Bill Welbourn Bill Welbourn
41

Introduction to Survival Analysis

Feb 03, 2016

Download

Documents

nellis

Introduction to Survival Analysis. Utah State University January 28, 2008 Bill Welbourn. Objectives of this Talk. Clarify what survival data is. Explain what makes survival data special. Example 1 – Survival estimation for a single population. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Survival Analysis

Introduction to Survival AnalysisIntroduction to Survival Analysis

Utah State UniversityUtah State University

January 28, 2008January 28, 2008

Bill WelbournBill Welbourn

Page 2: Introduction to Survival Analysis

Objectives of this TalkObjectives of this Talk Clarify what survival data is.Clarify what survival data is. Explain what makes survival data special.Explain what makes survival data special. Example 1 – Survival estimation for a Example 1 – Survival estimation for a

single population.single population. Example 2 – Survival comparison for two Example 2 – Survival comparison for two

populations via infant ALL data.populations via infant ALL data. Provide motivation for the need of Provide motivation for the need of

“special” methods for analyzing survival “special” methods for analyzing survival data.data.

Page 3: Introduction to Survival Analysis

Objectives of this Talk (cont)Objectives of this Talk (cont) The notion of risk estimation for survival The notion of risk estimation for survival

data, the Cox-Proportional Hazards Model.data, the Cox-Proportional Hazards Model. Example 3 – Infant ALL data revisited, Example 3 – Infant ALL data revisited,

analyzed using Cox-Proportional Hazards analyzed using Cox-Proportional Hazards Model.Model.

Page 4: Introduction to Survival Analysis

What is Survival Data?What is Survival Data? Data that deal with the time until the Data that deal with the time until the

occurrence of occurrence of anyany well-defined event. well-defined event.

Binary response which does not have to Binary response which does not have to be death/survival.be death/survival.

Page 5: Introduction to Survival Analysis

Examples of EventsExamples of Events Death Death Response to a treatmentResponse to a treatment Development of a disease in someone at Development of a disease in someone at

high riskhigh risk Resumption of smoking by someone who Resumption of smoking by someone who

had quithad quit Cancellation of service by a credit card Cancellation of service by a credit card

customercustomer Relapse of a patient in whom disease had Relapse of a patient in whom disease had

been in remissionbeen in remission

Page 6: Introduction to Survival Analysis

Complete DataComplete Data

The value of each sample unit is observed The value of each sample unit is observed or known.or known.

Ex.) Compute the average test score for a Ex.) Compute the average test score for a sample of 5 students: 90, 80, 76, 85, sample of 5 students: 90, 80, 76, 85, 82.82.

Page 7: Introduction to Survival Analysis

Why is Survival Data Special?Why is Survival Data Special?

Censored dataCensored data: The event of interest : The event of interest may not be observed or the exact times-may not be observed or the exact times-to-event of all the units are not known.to-event of all the units are not known.

Examples:Examples: The event of interest is death, but at the The event of interest is death, but at the

time of analysis the patient is still alive.time of analysis the patient is still alive.

A patient was lost to follow-up without A patient was lost to follow-up without having experienced the event of interest.having experienced the event of interest.

Page 8: Introduction to Survival Analysis

Examples (cont)Examples (cont)

The event of interest is death caused by The event of interest is death caused by cancer. A patient may die of an unrelated cancer. A patient may die of an unrelated cause, such as an automobile accident.cause, such as an automobile accident.

A patient is dropped from the study without A patient is dropped from the study without having experienced the event of interest having experienced the event of interest because of a major protocol violation.because of a major protocol violation.

Page 9: Introduction to Survival Analysis

Types of CensoringTypes of Censoring Right censoring: Right censoring: a survival time is not

known exactly but known to be greater than some value.

Page 10: Introduction to Survival Analysis

Types of Censoring (cont)Types of Censoring (cont) Left censoring: a failure time is only known Left censoring: a failure time is only known

to be before a certain time.to be before a certain time.

Ex.) Event of interest: development of a Ex.) Event of interest: development of a disease. disease.

At the time of examination, a 50-year-old At the time of examination, a 50-year-old participant was found to have already participant was found to have already developed the disease of interest, but no developed the disease of interest, but no record of exact time.record of exact time.

Page 11: Introduction to Survival Analysis

Types of Censoring (cont)Types of Censoring (cont) Interval censoring: Objects of interest are Interval censoring: Objects of interest are

not constantly monitored. Event of interest not constantly monitored. Event of interest is known to have occurred between times is known to have occurred between times a and b.a and b.

Ex.) At age of 45, the patient did not have Ex.) At age of 45, the patient did not have the disease. His age of diagnosis was the disease. His age of diagnosis was between age 45 and 50.between age 45 and 50.

Page 12: Introduction to Survival Analysis

Survival EstimationSurvival Estimation Example 1 - A hypothetical clinical trial: Example 1 - A hypothetical clinical trial:

Suppose that 10 patients enroll in a clinical Suppose that 10 patients enroll in a clinical trial at the beginning of 1988. During trial at the beginning of 1988. During 1988, 6 patients die. At the beginning of 1988, 6 patients die. At the beginning of 1989, 20 additional patients enroll in the 1989, 20 additional patients enroll in the trial. During 1989, 3 patients who enrolled trial. During 1989, 3 patients who enrolled in the trial at the beginning of 1988 die, in the trial at the beginning of 1988 die, and 15 patients who enrolled in the trial at and 15 patients who enrolled in the trial at the beginning of 1989 die. We are asked the beginning of 1989 die. We are asked to estimate the one year and two year to estimate the one year and two year survival for these patients.survival for these patients.

Page 13: Introduction to Survival Analysis

1988 1989 1990

05

10

15

20

Graph of Example 1

Year

Nu

mb

er

of P

art

icip

an

ts

= Alive; = Death

Page 14: Introduction to Survival Analysis

0 1 2

05

10

15

20

Survival Time from Study Entry (Example 1)

Length of Follow-up

Nu

mb

er

of P

art

icip

an

ts

= Alive; = Death

Page 15: Introduction to Survival Analysis

FOLLOW-UP TIME PARTICIPANTS TRACKED DEATHS CENSORED OBSERVATIONS ESTIMATED SURVIVAL PROBABILITY

0 30 21 5 1.000

1 4 3 0 0.300

2 1 0 1 0.075

TOTALS:   24 6  

Page 16: Introduction to Survival Analysis

Survival ComparisonSurvival Comparison Example 2 - For acute lymphoblastic Example 2 - For acute lymphoblastic

leukemia (ALL) in children, a small leukemia (ALL) in children, a small percentage of approximately 3% in this percentage of approximately 3% in this age range are diagnosed in the first year age range are diagnosed in the first year of life – referred to as infant ALL.of life – referred to as infant ALL.

Generally the outcome for infant ALL is Generally the outcome for infant ALL is much poorer than that for other children, much poorer than that for other children, where about 75% go into a quick where about 75% go into a quick remission and never have their disease remission and never have their disease return (i.e., are cured).return (i.e., are cured).

Page 17: Introduction to Survival Analysis

Survival Comparison (Ex 2 cont)Survival Comparison (Ex 2 cont) For infant ALL probably 65% will die of For infant ALL probably 65% will die of

their disease. While the outcome of ALL their disease. While the outcome of ALL in these very small babies is not good, in these very small babies is not good, there is nevertheless substantial known there is nevertheless substantial known heterogeneity in outcome based on patient heterogeneity in outcome based on patient characteristics – some subgroups doing characteristics – some subgroups doing much better and some much worse than much better and some much worse than the general outcome in infants.the general outcome in infants.

In this exercise, we will examine if survival In this exercise, we will examine if survival among ALL infants differs, depending on among ALL infants differs, depending on time of diagnosis (0-5 mo. vs. 6-11 mo.).time of diagnosis (0-5 mo. vs. 6-11 mo.).

Page 18: Introduction to Survival Analysis

Survival Comparison (Ex 2 cont)Survival Comparison (Ex 2 cont) Hypothesis test setup: Null states that Hypothesis test setup: Null states that

survival among ALL infants is the same, survival among ALL infants is the same, irrespective of the age of diagnosis. irrespective of the age of diagnosis. Alternative states that survival among Alternative states that survival among infants diagnosed with ALL at 0-5 months infants diagnosed with ALL at 0-5 months is a constant scaled power (at any follow-is a constant scaled power (at any follow-up time) of the survival among infants up time) of the survival among infants diagnosed with ALL at 6-11 months.diagnosed with ALL at 6-11 months.

More precisely, the alternative states that More precisely, the alternative states that the hazard rates for the two infant ALL the hazard rates for the two infant ALL groups are proportional through time.groups are proportional through time.

Page 19: Introduction to Survival Analysis

Structure of Survival DataStructure of Survival Data

The following SAS output provides an The following SAS output provides an overview of collected survival data.overview of collected survival data.

Page 20: Introduction to Survival Analysis

Survival Comparison (Ex 2 cont)Survival Comparison (Ex 2 cont) To test these hypotheses, we use the Log-To test these hypotheses, we use the Log-

Rank Test.Rank Test.

Page 21: Introduction to Survival Analysis

Survival Comparison (Ex 2 cont)Survival Comparison (Ex 2 cont) The Log-Rank Test from SAS’ Proc The Log-Rank Test from SAS’ Proc

Lifetest yields a p-value of 0.0057. There Lifetest yields a p-value of 0.0057. There is evidence in this case to reject the null is evidence in this case to reject the null hypothesis. These data indicate that there hypothesis. These data indicate that there is a statistically significantly difference in is a statistically significantly difference in survival among children diagnosed with survival among children diagnosed with ALL at 0-5 months when compared to ALL at 0-5 months when compared to children diagnosed with ALL at 6-11 children diagnosed with ALL at 6-11 months (p<0.01). The data suggest that months (p<0.01). The data suggest that survival is better among children survival is better among children diagnosed with ALL later in infancy. diagnosed with ALL later in infancy.

Page 22: Introduction to Survival Analysis

Survival Comparison (Ex 2 cont)Survival Comparison (Ex 2 cont) Assess Goodness-of-Fit (PH assumption).Assess Goodness-of-Fit (PH assumption).

Page 23: Introduction to Survival Analysis

Confounding FactorsConfounding Factors Recall, a confounding factor for an Recall, a confounding factor for an

association of interest – in this case, the association of interest – in this case, the age at diagnosis/survival relationship – age at diagnosis/survival relationship – must itself be associated to the outcome of must itself be associated to the outcome of interest (survival) and to the exposure of interest (survival) and to the exposure of interest (age at diagnosis).interest (age at diagnosis).

Let’s examine if abnormality for Let’s examine if abnormality for CHR11Q23 is a confounding factor for our CHR11Q23 is a confounding factor for our example.example.

Page 24: Introduction to Survival Analysis

Confounding Factors (cont.)Confounding Factors (cont.) To assess whether CHR11Q23 is To assess whether CHR11Q23 is

associated with survival, we use the Log-associated with survival, we use the Log-Rank Test. SAS reports a p-value <0.01. Rank Test. SAS reports a p-value <0.01. These data indicate that there is a These data indicate that there is a statistically significantly difference in statistically significantly difference in survival among children with an survival among children with an abnormality at the CHR11Q23 loci abnormality at the CHR11Q23 loci compared to children without the compared to children without the abnormality (p<0.01). The data suggest abnormality (p<0.01). The data suggest that survival is better among children that survival is better among children without the abnormality. without the abnormality.

Page 25: Introduction to Survival Analysis

Confounding Factors (cont.)Confounding Factors (cont.) To assess whether CHR11Q23 is To assess whether CHR11Q23 is

associated with age at ALL disgnosis, we associated with age at ALL disgnosis, we use Categorical Data Analysis. These use Categorical Data Analysis. These data indicate the odds of CHR11Q23 data indicate the odds of CHR11Q23 abnormality among children diagnosed abnormality among children diagnosed with ALL at 0-5 months is 2.78 times those with ALL at 0-5 months is 2.78 times those among children diagnosed with ALL at 6-among children diagnosed with ALL at 6-11 months (95% CI for OR, 1.19 – 6.48).11 months (95% CI for OR, 1.19 – 6.48).

Page 26: Introduction to Survival Analysis

Confounding Factors (cont.)Confounding Factors (cont.) Thus, the data suggest that CHR11Q23 is Thus, the data suggest that CHR11Q23 is

associated to both survival (outcome) and associated to both survival (outcome) and to age of ALL diagnosis… CHR11Q23 to age of ALL diagnosis… CHR11Q23 appears to be a confounder, and so we appears to be a confounder, and so we should control for the factor in the analysis.should control for the factor in the analysis.

After controlling for CHR11Q23, these data After controlling for CHR11Q23, these data still suggest that survival is better among still suggest that survival is better among infants diagnosed with ALL later in infancy, infants diagnosed with ALL later in infancy, but the evidence of the association has but the evidence of the association has decreased (p=0.03).decreased (p=0.03).

Page 27: Introduction to Survival Analysis

Why the use of “Special” Why the use of “Special” Statistical Methods forStatistical Methods for

Survival Data?Survival Data?

More precisely, since we have a binary More precisely, since we have a binary response, why not use categorical data response, why not use categorical data analysis methods (e.g., 2xC contingency analysis methods (e.g., 2xC contingency tables, logistic regression) to analyze tables, logistic regression) to analyze survival data? survival data?

Page 28: Introduction to Survival Analysis

““Special” Methods (cont)Special” Methods (cont) Log-Rank Test and the Score Test from Log-Rank Test and the Score Test from

Logistic Regression essentially equivalent Logistic Regression essentially equivalent when all censored observations equal the when all censored observations equal the maximum follow-up time.maximum follow-up time.

Biased results could arise from the use of Biased results could arise from the use of categorical data analysis methods, if categorical data analysis methods, if uniform censoring through follow-up time in uniform censoring through follow-up time in one group occurs and censoring at the one group occurs and censoring at the maximum follow-up time for the second maximum follow-up time for the second group occurs.group occurs.

Page 29: Introduction to Survival Analysis

““Special” Methods (cont)Special” Methods (cont) In utilizing categorical data analysis In utilizing categorical data analysis

methods, uniform censoring through methods, uniform censoring through follow-up time in both groups could lead to follow-up time in both groups could lead to bias toward the null hypothesis.bias toward the null hypothesis.

If censoring occurs at the beginning of the If censoring occurs at the beginning of the follow-up time for each group, utilizing follow-up time for each group, utilizing categorical data analysis methods could categorical data analysis methods could lead to bias toward the alternative lead to bias toward the alternative hypothesis.hypothesis.

Page 30: Introduction to Survival Analysis

““Special” Methods (cont)Special” Methods (cont) If censoring does not occur, categorical If censoring does not occur, categorical

data analysis methods cannot be applied.data analysis methods cannot be applied.

In summary, survival analysis methods In summary, survival analysis methods exist to handle the censoring of exist to handle the censoring of observations.observations.

Page 31: Introduction to Survival Analysis

Risk Estimation forRisk Estimation for Survival Data Survival Data

Log-Rank Test provides a means in testing Log-Rank Test provides a means in testing for an association in survival. Cox-for an association in survival. Cox-Proportional Hazards (CPH) Model Proportional Hazards (CPH) Model provides a regression extension so that provides a regression extension so that risk estimation in survival can be made.risk estimation in survival can be made.

Risk estimate for CPH Model is the hazard Risk estimate for CPH Model is the hazard ratio.ratio.

Page 32: Introduction to Survival Analysis

GLM versus CPH ModelGLM versus CPH Model

GLM – Parametric Models:GLM – Parametric Models:

CPH – Semi-parametric Model:CPH – Semi-parametric Model:

Page 33: Introduction to Survival Analysis

Survival Comparison (CPH)Survival Comparison (CPH) Example 3 – Let’s revisit the infant ALL Example 3 – Let’s revisit the infant ALL

data and analyze using the CPH Model.data and analyze using the CPH Model.

Null hypothesis states that the hazard rate Null hypothesis states that the hazard rate among ALL infants is the same, among ALL infants is the same, irrespective of the age at diagnosis. irrespective of the age at diagnosis. Alternative states that the hazard rate (at Alternative states that the hazard rate (at any follow-up time) among infants any follow-up time) among infants diagnosed with ALL at 6-11 months is a diagnosed with ALL at 6-11 months is a constant multiple of the hazard rate among constant multiple of the hazard rate among infants diagnosed with ALL at 0-5 months.infants diagnosed with ALL at 0-5 months.

Page 34: Introduction to Survival Analysis

Survival Comparison (Ex 3 cont)Survival Comparison (Ex 3 cont) To test these hypotheses, we use the To test these hypotheses, we use the

Cox-Proportional Hazards Regression Cox-Proportional Hazards Regression ModelModel..

The CPH Model:The CPH Model:

Model under the null:Model under the null:

Page 35: Introduction to Survival Analysis

Survival Comparison (Ex 3 cont)Survival Comparison (Ex 3 cont) Model under the alternative:Model under the alternative:

SAS’ Proc Phreg reports the p-value from SAS’ Proc Phreg reports the p-value from the Likelihood Ratio Test to be 0.0057. the Likelihood Ratio Test to be 0.0057. Note that this result is essentially Note that this result is essentially equivalent to the Log-Rank Test. This is equivalent to the Log-Rank Test. This is expected as the hypotheses are the same expected as the hypotheses are the same for the CPH Test and the Log-Rank Test.for the CPH Test and the Log-Rank Test.

Page 36: Introduction to Survival Analysis

Survival Comparison (Ex 3 cont)Survival Comparison (Ex 3 cont) These data indicate the risk of death among These data indicate the risk of death among

infants diagnosed with ALL at 0-5 months is infants diagnosed with ALL at 0-5 months is 2.10 times that of infants diagnosed with ALL 2.10 times that of infants diagnosed with ALL at 6-11 months (95% CI for RR, 1.23 – 3.60).at 6-11 months (95% CI for RR, 1.23 – 3.60).

Page 37: Introduction to Survival Analysis

Survival Comparison (Ex 3 cont)Survival Comparison (Ex 3 cont) Survival curves from SAS’ Proc Phreg:Survival curves from SAS’ Proc Phreg:

Page 38: Introduction to Survival Analysis

Survival Comparison (Ex 3 cont)Survival Comparison (Ex 3 cont) Assess Goodness-of-Fit (PH assumption).Assess Goodness-of-Fit (PH assumption).

Page 39: Introduction to Survival Analysis

Confounding Factors RevisitedConfounding Factors Revisited As with the Log-Rank procedure, we can As with the Log-Rank procedure, we can

control for confounding factors in the CPH control for confounding factors in the CPH Model.Model.

The interpretation of the RR, models that The interpretation of the RR, models that of other regression techniques.of other regression techniques.

Page 40: Introduction to Survival Analysis

Adjusted RR InterpretationAdjusted RR Interpretation After controlling for CHR11Q23 After controlling for CHR11Q23

abnormality, these data indicate the risk of abnormality, these data indicate the risk of death among infants diagnosed with ALL death among infants diagnosed with ALL at 0-5 months is 1.82 times that of infants at 0-5 months is 1.82 times that of infants diagnosed with ALL at 6-11 months (95% diagnosed with ALL at 6-11 months (95% CI for RR, 1.05 – 3.15).CI for RR, 1.05 – 3.15).

Page 41: Introduction to Survival Analysis

Questions?Questions?