Survival/Failure Analysis (AKA Event History Analysis) T & F Chapter 11 Data Example 1. A medical doctor wished to compare the efficacy of two drugs for treating a sometimes fatal illness. Two groups of patients with the disease were identified. One group was given Drug A. The other group was given Drug B. The age of the patient at the time of drug administration was recorded. The patients were then monitored by a special team of patient observers. The age of the patient at time of death was recorded and the survival duration from the time the patient began taking the drug computed. Several of the patients lived for many years. The study was terminated when the last patient in the two groups died – more than 60 years after the beginning of the study. (The original researcher died while waiting for the last patient to die. The original researcher’s grandchildren were available to continue the analyses.) The grandchildren used the Mann-Whitney U-test to compare survival times between the two groups. (The U-test was used because survival times are notoriously positively skewed.) This is the appropriate way to compare the efficacy of the two drugs. Problems: 1) The long amount of our time it will take to observe survival times of all patients. 2) What to do about persons who get “lost” – from whom contact was lost. These patients give incomplete data. Should they be ignored – treated as missing values? Survival Analysis – 1 Printed on 10/26/2016
81
Embed
Survival Analysis - University Homepage€¦ · Web viewSurvival/Failure Analysis (AKA Event History Analysis) T & F Chapter 11 Data Example 1. A medical doctor wished to compare
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Survival/Failure Analysis(AKA Event History Analysis)
T & F Chapter 11
Data Example 1. A medical doctor wished to compare the efficacy of two drugs for treating a sometimes fatal illness. Two groups of patients with the disease were identified. One group was given Drug A. The other group was given Drug B.
The age of the patient at the time of drug administration was recorded.
The patients were then monitored by a special team of patient observers.
The age of the patient at time of death was recorded and the survival duration from the time the patient began taking the drug computed. Several of the patients lived for many years.
The study was terminated when the last patient in the two groups died – more than 60 years after the beginning of the study. (The original researcher died while waiting for the last patient to die. The original researcher’s grandchildren were available to continue the analyses.)
The grandchildren used the Mann-Whitney U-test to compare survival times between the two groups. (The U-test was used because survival times are notoriously positively skewed.)
This is the appropriate way to compare the efficacy of the two drugs.
Problems:
1) The long amount of our time it will take to observe survival times of all patients.
2) What to do about persons who get “lost” – from whom contact was lost.These patients give incomplete data. Should they be ignored – treated as missing values?
Because of Problem 1 above, we typically do NOT wait until every participant in our research has died before analyzing.
Instead we define a Window of Observation, and observe participants only while that window is open.
Survival Analysis – 1 Printed on 10/26/2016
Window of Observation
The problem is that we don’t have an infinite period of time to wait until everyone quits or dies. So what do we do about the persons who are still alive when the window of observation is closed?
Plus, it may be the case that we lose contact with people so for some people we won’t know how long they survived regardless of the length of the window of observation. All we know about them is that they were alive until a specific time. We don’t know whether they’re still alive or not after that time.
The window of observation is the specific time period in which participant survival is recorded.
At some time, we begin recording whether or not each person is surviving or not. At some later time, we quit monitoring each patient.
Because the window is of finite duration, this necessarily results in incomplete information on some participants.
Of particular importance is the fact some will still be alive/working when we quit observing.
This means that we won’t have accurate survival times for some people.
Medical literature
Two treatments for a disease are given. We attempt to record1) Whether or not each patient died – the dichotomous outcome – and 2) how long each patient survived until death – the continuous outcome.
Group A given Drug A.Group B given Drug B.
Turnover literature
Persons are hired by an organization into two different buildings. We attempt to record1) Whether or not each employee quits or retires and2) how long each employee is employed before leaving the organization.
Building A: Kill and Debone chickensBuilding B: Cook the chicken carcasses
Ideal Cases – each starting time and ending time is known.
Right Censored Cases: Cases whose ending times (time of termination/deathare unknown. These are the most common problem cases.
The above cases are still employed/surviving at the time monitoring ends. ??????????????????????????????
The above case is lost to follow-up (quit answering phone, left state, etc.)
Left Censored Cases: Cases whose starting times are unknown.
We will not include such cases in the analyses conducted here.
Cases whose starting times and ending times are unknown,Fagettaboutit – these are not analyzable.
???? ????
Survival Analysis – 3 Printed on 10/26/2016
????????????????????????????
Time
Monitoring of cases begins, i.e., Window
opens
Monitoring of cases ends, i.e., Window
closes
????????
Incorrect Analysis 1: Use death/quit rates as a proxy for survival
Assuming that persons with long survival times will be less likely to die within the window of observation, we could use death or quit rates as an indicator of survival time.
We could use logistic regression to assess the relation of death or quit rates to independent variables.
(Use linear regression in a pinch praying that the God of statistics won’t strike you down).
Problem – it’s possible to create situations in which most people would agree that the distributions of survival times are different even though proportions of outcomes are identical.
Consider the following . . . Assume we’re dealing with employment.
In the figures, each arrow represents duration of employment for a person. The horizontal axis is time. The vertical line at the left represents the time at which the window of observation opened. The vertical line at the right represents the time at which the window closed. The -> of the arrow represents death/termination.
Group A – Termination Rate = 100%
Group B – Termination Rate = 100%
Clearly, Group A has longer average employment times, but both have the exact same proportion of turnovers – 100% in this example.
So, comparison of death/quit rates may certainly give us an inaccurate picture of the differences between the groups.
Survival Analysis – 4 Printed on 10/26/2016
Incorrect Analysis 2 – Analyze only the durations within the window of observation. Ignore the deaths/turnovers.
Group A. Average Survival time =
Group B. Average Survival time =
In the example above, the two groups have equal (ultimate) death rates but different survival times just within the window of observation – In Group A all subjects had “time to die.” In Group B, subjects were still living when the window of observation closed. In this case, analysis of survival times within the finite window of observation will give an incorrect picture of the lack of difference between the groups.
Each type of incomplete analysis ignores the other aspect of the complete dependent variable. We need a method of analysis that takes into account both aspects.
Survival analysis is an analytic technique that combines both aspects.
Comparisons of different groups includes . . .
Comparison of proportion dying / leaving
Comparison of time surviving / staying.
Survival Analysis – 5 Printed on 10/26/2016
Survival Analysis (also called Event History Analysis)
An analytic technique that models both survival times and proportions of deaths / quits.
3 separate techniques available in SPSS – Life Table, Kaplan-Meier, Cox Regression
Key concept common to all techniques
Survival function – most important one of all of them
A plot of proportion surviving from time 0 up to a given time vs. time
A cumulative plot.
Generally decreasing curve, since proportion surviving can only remain constant or decrease across time.
Separate curves for separate groups
The curve represents both aspects of survival.
1) The height of the curve at a point represents the proportion surviving up to that time.
2) The curve also represents duration of stay/life (how far the curve has progressed to the right from t=0). The distance along the X-axis represents the average survival time for those who have a specific survival rate.
So, the survival curve is a two-dimensional representation of the two aspects of survival – survival rates and length of life/employment.
Survival Analysis – 6 Printed on 10/26/2016
0 Time
ProportionSurviving
30
50%
100%At time 30, 63% have survived.
At time 30, 54% have survived.
63% have survived at time, t.
At 60% survival rate, average length of time was 17. survived at time, t.
60%
At 60% survival rate, average length of time was 85. survived at time, t.
10 20 40 50 60 70 80
Comparing groups.
The vertical axis represents proportion of survivals or turnovers.
Within a vertical slice at any point, turnover rates up to a particular time can be compared.In the following, we see that at time t, Group A had a higher survival rate than Group B.
Comparing Average survival times between two groups.
Within a horizontal slice at any point, average survival times can be compared.
In the following, we see that for a 70% survival rate, average survival time was longer for Group A than it was for Group B.
When comparing groups we will usually compare the whole curve for each group. The group whose curve is generally above the others is the group with best survival.
Survival Analysis – 7 Printed on 10/26/2016
Time
B
A
70%Time
B
A
t
Three general types of Survival Analysis
1. Life Tables analysis.
The window of observation is cut up into n equal-length intervals.
Proportions of persons surviving/dying within each interval are computed.
This is the original method.
Useful for analysis of one group or for comparison of a few groups defined by levels of a single categorical factor.
Can’t incorporate quantitative predictors.
Can’t incorporate more than 2 qualitative predictors in SPSS.
Cannot analyze interactions of 2 or more predictors.
2. Kaplan-Meier analysis.
Event-based. Rather than defining intervals based on time, intervals are defined based on occurrence of death/termination. Each death/termination marks the end of one interval and the beginning of a subsequent interval.
Can’t incorporate quantitative predictors.
Can’t incorporate more than 2 qualitative predictors in SPSS.
Cannot analyze interactions of 2 or more predictors.
Survival function graphs printed by SPSS’s K-M procedure show censored cases, a plus.
Based on a specific mathematical model of survival developed by Cox.
Estimates hazard probabilities for whole sample.
Then estimates ratios of hazards to this overall hazard function for groups/persons with different values of IV’s
As implemented in SPSS, output and analyses look at lot like logistic regression.
Can incorporate quantitative predictors.
Can incorporate multiple qualitative and quantitative factors.
Can incorporate interactions.
Survival Analysis – 9 Printed on 10/26/2016
Based on Tabachnick Table 11.1, p. 515Analyzed using SPSS Life Tables
Suppose the efficacy of Drug 0 is being compared with that of Drug 1. Each was formulated to prolong life of patients with a usually terminal form of cancer. Seven patients were given Drug 0 and five were given Drug 1. Patients were observed for up to 12 months. After 12 months, the window of observation closed and the results were entered into SPSS. Note that this problem is analogous to a turnover problem in organizational research with two groups of employees treated differently.
The SPSS syntax to invoke the analysis.
SAVE OUTFILE='G:\MdbT\P595\P595AL07-Survival analysis\TAndFDancingData.sav' /COMPRESSED.SURVIVAL TABLE=months BY drug(0 1) /INTERVAL=THRU 12 BY 1 /STATUS=outcome(1) /PRINT=TABLE /PLOTS (SURVIVAL)=months BY drug.
The results suggest that survival is significantly longer with Drug 1 – the top (orange) curve.
Survival Analysis – 11 Printed on 10/26/2016
It’s my assumption that all the times are collected and the median of those times is reported here. It should correspond closely to the intersection of survival functions and a horizontal line at 50% survival.
Tabachnick Table 11.1, p. 511 Analyzed using SPSS Kaplan-Meier
Analyze Survival Kaplan-Meier . . .
KM months BY drug /STATUS=outcome(1) /PRINT TABLE MEAN /PLOT SURVIVAL /TEST LOGRANK BRESLOW TARONE /COMPARE OVERALL POOLED.
Survival Analysis – 12 Printed on 10/26/2016
[Define Event] had already been pressed when this screen shot was taken.
Test of equality of survival distributions for the different levels of drug.
Survival Analysis – 13 Printed on 10/26/2016
As was the case with the analysis using the LIFE TABLES procedure, the results support the conclusion that survival is significantly longer with Drug 1.
Survival Analysis – 14 Printed on 10/26/2016
Note that censored cases are denoted with a + on the survival function.
Tabachnick Table 11.1, p. 511 – start here on 11/6/17Analyzed using SPSS Cox Regression
The program will not produce a survival curve for a group of cases defined by the value of a variable unless that variable is a categorical variable. (Reminds me of the RCMDR Factor issue.)
For that reason, I told the program that drug is a categorical variable so that survival curves for each value of drug could be obtained.
Since drug is a dichotomy, the analysis could be done without labeling it categorical, but in that case the survival curves for each value of drug could not have been generated.
Survival Analysis – 15 Printed on 10/26/2016
The left panel would yield 1 plot The right panel yields a plot for each value of drug.
COXREG months /STATUS=outcome(1) /PATTERN BY drug /CONTRAST (drug)=Indicator(1) /METHOD=ENTER drug /PLOT SURVIVAL /CRITERIA=PIN(.05) POUT(.10) ITERATE(20).
Cases available in analysis Eventa 11 91.7%Censored 1 8.3%Total 12 100.0%
Cases dropped Cases with missing values 0 .0%Cases with negative time 0 .0%Censored cases before the earliest event in a stratum
0 .0%
Total 0 .0%Total 12 100.0%a. Dependent Variable: months
Categorical Variable Codingsb
Frequency (1)druga 0 7 0
1 5 1a. Indicator Parameter Codingb. Category variable: drug
Survival Analysis – 16 Printed on 10/26/2016
As mentioned above if you want separate predicted survival functions for each value of a categorical variable, put the name of that categorical variable here.
Block 0: Beginning Block
Omnibus Tests of Model Coefficients
-2 Log Likelihood40.740
Block 1: Method = Enter
Omnibus Tests of Model Coefficientsa
-2 Log LikelihoodOverall (score) Change From Previous Step Change From Previous Block
Chi-square df Sig. Chi-square df Sig. Chi-square df Sig.37.394 3.469 1 .063 3.346 1 .067 3.346 1 .067
a. Beginning Block Number 1. Method = Enter
Variables in the Equation
B SE Wald df Sig. Exp(B)drug -1.176 .658 3.192 1 .074 .309
Covariate Means and Pattern Values
MeanPattern
1 2drug .417 .000 1.000
I strongly recommend that you create a plot such as the one immediately above by hand to make sure you understand the Cox Regression results. I do it every time I use this procedure.
Survival Analysis – 17 Printed on 10/26/2016
Cox regression coefficient signs are relative to death, not survival. So, a positive sign means that larger values of the independent variable have higher death rates. And negative signs mean that larger values of the independent variable have lower death rates.
Drug
Death
10
In Cox Regression, we’re predicting DEATH, not survival.
COXREG plots are plots of predicted survival, not actual survival. In this sense, they’re like the tables and plots of estimated marginal means from GLM. I usually report observed survival functions, using Kaplan-Meier, rather than these predicted survival functions. However, these are certainly useful in situations in which you want to show what survival should be for specific groups at specific times controlling for the other variables in the equation.
Survival Analysis – 18 Printed on 10/26/2016
Y-hats
The Cox-Regression plots are y-hat plots, not observed survival functions.
They are predicted survival, not actual survival.
Note, however, that they are predicted SURVIVAL curves, not death curves.
From Kaplan-Meier
Real Life Example: Turnover at a local Manufacturing Plant
1. Effect of Friends and/or family at the plant
In this study, turnover at a local manufacturing plant was studied. On the application blank, applicants were asked to indicate whether or not they had friends or family already working at the plant.
Some did not respond to this question. They’re included in the analysis.A screen shot of the data editor
The variable, wsfr2, represents whether or not the applicant had friends at the company.
wsfr2 = 0.50 means yes.wsfr2 = -0.50 means no.wsfr2 = 0.15 means no info.
Wsfr2 was created to deal with missing values in a special way. The fact that the values are fractional has no bearing on the analyses. They could just as well have been 0, 1, 2 or 1, 2, 3.
Having said that, because the LIFE TABLES procedures requires integer values of each factor, I’ll skip it here.
Kaplan-Meier analysis is shown
Some of SPSS’s procedures are written so that a grouping variable can have any kind of values. K-M is one of them.
K-M allows you to simply specify the name of the factor, and the program figures out how many groups are implied by the values of the factor.
That’s good unless you have a grouping variable with some incidental values representing unique cases or groups of cases – cases you wish to be excluded from the analysis.
Survival Analysis – 19 Printed on 10/26/2016
KM dos BY wsfr2 /STATUS=status(1) /PRINT TABLE MEAN /PLOT SURVIVAL /TEST LOGRANK BRESLOW TARONE /COMPARE OVERALL POOLED .
Survival Analysis – 20 Printed on 10/26/2016
Kaplan-Meier
[DataSet3] G:\MdbR\1TurnoverArticle\TurnoverArticleDataset061005.savLarge table was deleted.
Case Processing Summarywsfr2 Whether F/F at company
a. Estimation is limited to the largest survival time if it is censored.
Note that there is no estimate of median survival for the 0.50 group. I’m not absolutely sure why, but I believe it’s because more than 50% of the persons in that group were still on the job at the end of the observation window. For that reason, a median was not computable.
Overall ComparisonsChi-Square df Sig.
Log Rank (Mantel-Cox) 25.344 2 .000
Breslow (Generalized Wilcoxon) 25.325 2 .000
Tarone-Ware 25.004 2 .000
Test of equality of survival distributions for the different levels of wsfr2
Whether F/F at company for whole sample analyses.
Clearly there are significant differences in overall survival between the groups.
Survival Analysis – 21 Printed on 10/26/2016
-.50 = No friends.15 = No info.50 = Had friends
The data strongly suggest that applicants who had friends or family at the company had higher survival rates at all times up to 1100 days (about 3 years).
For example, at the end of 1 year survival (leftmost arrow in the above figure) rate of those with friends and family was about 70% while that for those who said they did not have friends or family at the organization was about 60%.
By two years (middle arrow), the rate of retention of those with was about 68% while the rate of those without had decreased to 50%.
The fact that the curve for those for whom no information was available was between the other two curves suggests that those employees for whom no information was available were a mixture of some who did have friends and family and those who did not.
Survival Analysis – 22 Printed on 10/26/2016
Had friends or family
No friends or family
Missing response
1 year 2 years
Note the huge difference in proportion surviving after two years – almost 20% difference between those with friends and those without friends.
3 years
Same analysis using SPSS Cox RegressionAnalyze Survival Cox Regression . . .
Survival Analysis – 23 Printed on 10/26/2016
In my limited experience with group coding variables in survival analysis, I’ve found that Dummy Variable (Indicator in SPSS) coding is the one that is most useful.
COXREG dos /STATUS=status(1) /PATTERN BY wsfr2 /CONTRAST (wsfr2)=Indicator /METHOD=ENTER wsfr2 /PLOT SURVIVAL /CRITERIA=PIN(.05) POUT(.10) ITERATE(20).
Cox RegressionCase Processing Summary
N Percent
Cases available in analysis Eventa 434 33.4%
Censored 867 66.6%
Total 1301 100.0%
Cases dropped Cases with missing values 0 0.0%
Cases with negative time 0 0.0%
Censored cases before the
earliest event in a stratum
0 0.0%
Total 0 0.0%
Total 1301 100.0%
a. Dependent Variable: dos
Survival Analysis – 24 Printed on 10/26/2016
About 1/3 of the employees were still working when the window of observation closed.
Remember, the variable representing different groups must have been specified as categorical.
Categorical Variable Codingsa
Frequency (1) (2)
wsfr2b -.50=-.50 423 1 0
.15=Whole sample missing value 100 0 1
.50=.50 778 0 0
a. Category variable: wsfr2 (Whether F/F at company for whole sample analyses)
b. Indicator Parameter Coding
Block 0: Beginning BlockOmnibus Tests
of Model Coefficients
-2 Log Likelihood
5871.672
Block 1: Method = EnterOmnibus Tests of Model Coefficientsa
-2 Log
Likelihood
Overall (score) Change From Previous Step Change From Previous Block
Chi-square df Sig. Chi-square df Sig. Chi-square df Sig.
Recall that the sign of each coefficient is relative to “Termination”.
WSFR2(1) compares the proportion terminating in the -.50 group to the proportion in the +.50 group.
Since the coefficient is +.489, this says that the -.50 group has larger probability of terminating than the .50 group.
Same for the wsfr2(2) – The no response group has greater probability of terminating than the +.50 group.
Covariate Means and Pattern Values
Mean
Pattern
1 2 3
wsfr2(1) .325 1.000 .000 .000
wsfr2(2) .077 .000 1.000 .000
Survival Analysis – 25 Printed on 10/26/2016
Term’d
I’m not sure what this table is for.
The reference group is the wsfr2 = +0.50 “Have Friends” group.
0 1
Survival Analysis – 26 Printed on 10/26/2016
Predicted survival for the whole sample.
These are predicted survival curves, which is why they’re so smooth.
We might use these data to read the minds of those who did not respond to the “Do you have friends or family?” The similarity of their survival function to the “No Friends” function suggests that most did not have friends or family at the organization.
Using Survival Analysis to score and validate selection test questions.An I/O consulting firm gave a 30-question pre-employment questionnaire to 1000+ employees of a local company. Each question had from one to five alternatives. The consulting company wanted to identify questions that predicted long tenure with the organization. (They would have preferred to identify questions that predicted high performance, but it was not possible to get good performance data. Don’t get me started on why organizations don’t gather good performance data.)
In order to identify responses associated with long tenure, a survival analysis was conducted for each question. A few of the analyses are presented below.
For each survival function, each curve is the survival function of persons who made a particular response to the item. I picked only those for which the difference in survival curves was significant or approached significance.
Question 1Overall Comparisons
Chi-Square df Sig.
Log Rank (Mantel-Cox) 5.382 2 .068
Breslow (Generalized Wilcoxon) 4.307 2 .116
Tarone-Ware 4.756 2 .093
Survival Analysis – 27 Printed on 10/26/2016
The numbers represent the 3 possible responses to the question, coded as +1, 0, -1.
For this question, I believe we treated +1 as an indicator of long tenure and both 0 and -1 as indicators of short tenure.
+1
0
-1?
Question 2Overall Comparisons
Chi-Square df Sig.
Log Rank (Mantel-Cox) 7.647 4 .105
Breslow (Generalized Wilcoxon) 6.950 4 .139
Tarone-Ware 7.298 4 .121
Survival Analysis – 28 Printed on 10/26/2016
+1
0
-1?
As in the case of the question on the previous page, the response coded as +1 was treated as an indicator of long tenure and all other responses were treated as indicators of short tenure.
Question 3Overall Comparisons
Chi-Square df Sig.
Log Rank (Mantel-Cox) 5.070 3 .167
Breslow (Generalized Wilcoxon) 5.525 3 .137
Tarone-Ware 5.493 3 .139
Test of equality of survival distributions for the different levels of GenQ4 Gen
Q4 L:I prefer a job that / S: How often you experience conflict with a co-
worker?.
Survival Analysis – 29 Printed on 10/26/2016
+1
0
There were very few persons who responded +1 or 0, but those who did were treated as long tenure and those who responded 0 as short tenure.
Question 4
Overall Comparisons
Chi-Square df Sig.
Log Rank (Mantel-Cox) 7.753 4 .101
Breslow (Generalized Wilcoxon) 6.762 4 .149
Tarone-Ware 7.439 4 .114
Test of equality of survival distributions for the different levels of GenQ3 Gen
Q3 L: Recieved safety training? / S: You are asked to do more physically
demanding work than you were hired to do because someone out sick, how
do you react?.
Survival Analysis – 30 Printed on 10/26/2016
+1
0, -1
+1: Long tenureElse: Short tenure
Question 5
Overall Comparisons
Chi-Square df Sig.
Log Rank (Mantel-Cox) 10.971 4 .027
Breslow (Generalized Wilcoxon) 9.931 4 .042
Tarone-Ware 10.597 4 .031
Test of equality of survival distributions for the different levels of GenQ2 Gen
Q2 L: Your team in disagreement over who will clean the floor. What
method is fair?/ S: Recent supervisor rate dependability?.
Survival Analysis – 31 Printed on 10/26/2016
+1
0
Long Tenure
Short Tenure
Question 6
Overall Comparisons
Chi-Square df Sig.
Log Rank (Mantel-Cox) 8.052 3 .045
Breslow (Generalized Wilcoxon) 12.729 3 .005
Tarone-Ware 10.614 3 .014
Test of equality of survival distributions for the different levels of GenQ1
GenQ1 L: Which strategies inspire a team and help be more effective?/
S:Your team in disagreement over who will clean the floor. What method is
fair?.
Survival Analysis – 32 Printed on 10/26/2016
+1
0
Creation of an overall Tenure Index
Thirty questions were evaluated in the above fashion.
After examination of the individual survival curves for the 30 questions, those for which significant differences in survival between responses were identified by examining the survival analysis for each question as shown above.
Finally, an overall index was calculated, using syntax like the following . . .
In this particular case, the response associated with long survival added 1 to the index.
The response associated with short survival subtracted 1 from the index.
Tenure Scale Computation
Compute genshort=0.if ((genq1=3 or genq1=4)) genqshort=genqshort + 1.if ((genq1=1 or genq1=2)) genqshort=genqshort - 1.if ((genq2=3 or genq2=4)) genshort=genshort + 1.if ((genq2=1 or genq2=2 or genq2=5)) genshort=genshort - 1.if ((genq6=3)) genshort=genshort + 1.if ((genq6=1 or genq6=2)) genshort=genshort - 1.if ((genq12=1)) genshort=genshort + 1.if ((genq12=3)) genshort=genshort - 1.if ((genq13=1)) genshort=genshort + 1.if ((genq13=2 or genq13=3 or genq13=4)) genshort=genshort - 1.if ((genq21=1 or genq21=3)) genshort=genshort + 1.if ((genq21=2)) genshort=genshort - 1.
Survival Analysis – 33 Printed on 10/26/2016
Validity of the Tenure Index
The following is not based on the scale above but on a similar scale.
The median score on the scale was determined to be -14,Group 0 was all employees with an index value less than or equal to -14 – persons who generally responded with the “short tenure” answer.
Group 1 was all employees with an index value greater than -14 – persons who generally responded with the “long tenure” answer.
The graph indicates that those in Group 1, with large values of the index, had a nearly 70% retention rate after 50 months.
Those in Group 0 had a 40% retention rate after the same length of time.
The implication of this analysis would be to recommend to the company to use the scale in hiring of employees, giving preference in hiring to those with higher scores on the scale.
Remember that these responses were obtained at time of application. The effect lasted for 4 years.
Potential problems
The above curve was based on the same sample that was used to select the questions. So clearly there is capitalization on chance. The scale should be tested on a different sample. That is the results need to be cross validated.
Survival Analysis – 34 Printed on 10/26/2016
Group 0 – low tenure
Group 1 – high tenure
1 yr 2 yr 3 yr 4 yr
Multivariate Analysis using Cox Regression
Turnover as a function of 1) friends at the organization (wsfr2) and 2) sex of the employee, and 3) ethnic group of the employee (neth)
COXREG dos /STATUS=status(1) /PATTERN BY wsfr2 /CONTRAST (neth)=Indicator(1) /CONTRAST (wsfr2)=Indicator /METHOD=ENTER wsfr2 nsex neth /PLOT SURVIVAL /CRITERIA=PIN(.05) POUT(.10) ITERATE(20).
Wsfr2 -0.50 does not have friends at company 0.15 no info on whether has friends0.50 friends at the company
Nsex 1 Female2 Male
Neth 1 Employee is White2 Employee is Black
Survival Analysis – 35 Printed on 10/26/2016
3 Employee is American Indian or Asian or Hispanic
So no significant interaction means that the effect of having friends is the same for Females as it is for Males
Survival Analysis – 40 Printed on 10/26/2016
To specify that an interaction be tested, click on the 1st variable name, then while holding down the CTRL key or Command on the Mac, click on the 2nd variable name.
Finally, click on the >a*b> button.
2. The interaction of Friends and Neth – assessed in a separate analysis.
Block 1: Method = Enter
Omnibus Tests of Model Coefficientsa
-2 Log Likelihood
Overall (score) Change From Previous Step Change From Previous Block
Chi-square
df Sig. Chi-square
df Sig. Chi-square
df Sig.
5820.584 49.194 9 .000 51.088 9 .000 51.088 9 .000a. Beginning Block Number 1. Method = Enter
Comparing Turnover in two plantsA company was interested in determining the causes of turnover in two of its plants.
Plant A: One part of the preparation of food for sale to retailers is undertaken.Plant B: A different part of the preparation of food for sale to retailer is undertaken.
Each plant is managed by a different person.
The overall “survival” of employees in the two plants, reploc=1 and reploc=2, is as follows . . .
filter off.compute reploc = newloc.value labels reploc 1 "A" 2 "B".filter by useme.KM dayswrkd by reploc /STATUS=termed(1)/PRINT MEAN /PLOT SURVIVAL/TEST LOGRANK BRESLOW TARONE /COMPARE OVERALL POOLED.Kaplan-Meier[DataSet1] G:\MDBR\???\AllEmployeesNN041025.sav
a. Estimation is limited to the largest survival time if it is censored.
Overall Comparisons
Chi-Square df Sig.
Log Rank (Mantel-Cox) 13.633 1 .000
Breslow (Generalized Wilcoxon) 10.203 1 .001
Tarone-Ware 11.880 1 .001
Test of equality of survival distributions for the different levels of reploc.
Survival Analysis – 42 Printed on 10/26/2016
filter off.
Clearly, employee “retention/survival” is best in Plant B – reploc = 2.
The manager of Plant A was pretty defensive.
Survival Analysis – 43 Printed on 10/26/2016
Plant B
Plant A
Are these differences in survival rates the same for the different ethnic groups employed by the company?
Perhaps the differences between buildings are due to the fact that the different buildings have different proportions of ethnic groups coupled with the fact that the different ethnic groups have different survival rates.
neweth * reploc Crosstabulationreploc
Total1.00 A 2.00 Bneweth .00 White or Black Count 130 219 349
% within reploc 58.6% 74.3% 70.1%Total Count 314 853 1167
% within reploc 100.0% 100.0% 100.0%
These differences suggest that the difference in survival between buildings might be a side-effect of the difference in proportion of Hispanics in the two buildings combined with the difference in survival between Hispanics vs. White/Black,
The way to resolve this issue is to perform a multivariate analysis, assessing the Plant effect while controlling for the Ethnic Group effect..
This can only be done with Cox Regression.
Survival Analysis – 44 Printed on 10/26/2016
Hispanic
White/Black
Multivariate analysis joint effect of plant and ethnic group.
So, when controlling for differences in ethnic groups, no difference in survival (turnover) between the two buildings was found. The manager of Building A was very happy with this result.
Survival Analysis – 45 Printed on 10/26/2016
Since both factors – reploc and neweth – are dichotomous, I did not bother to identify them as categorical variables for SPSS. I will not be able to get survival curves for the individual combinations, though, because they’re not identified as categorical.
Survival Analysis of a phenomenon with a positive outcomePEG vs. PEGJ Example – skipped in 2018
The data for this example compared two methods of feeding trauma patients, one using a percutaneous esophagogastrojejunostomy (PEGJ) and the other using percutaneous esophagogastrostomy (PEG). It was hoped that the data would show that the PEGJ technique would provide continuous uninterrupted nutrition with greater consistency than with PEG. Time to reach a nutrition goal was the continuous dependent variable. Patients were observed for 14 days. Whether or not a patient reached the goal was the status. Reaching the goal was the +1 state. A patient who had not reached the goal in 14 days, was treated as a censored case. Group=1 is the PEGJ group. Group=2 is the PEG group.
The output of LIFE TABLESSURVIVAL TABLE=DAYSGOAL BY GROUP(1 2) /INTERVAL=THRU 15 BY 1 /STATUS=GOALIN14(1) /PRINT=TABLE /PLOTS ( SURVIVAL)=DAYSGOAL BY GROUP .
These data are strange because the “event” is something that is sought after - reaching a feeding goal, rather than something that is to be avoided - death or termination. So for these data, lower "survival" is preferred, since the "event" is not death, but reaching a nutrition goal. The sooner a patient reached the nutrition goal the better. Thus, the investigators hoped that patients in the PEJ condition would reach those goals faster, leading to lower "survival" curves. In this case, survival should be called "Failure to reach feeding goal."
Survival Analysis – 49 Printed on 10/26/2016
Since the outcome is a good event, the faster the curve falls to zero, the better.
So the group performing best is the group with the lowest curve.
Analysis of the same data using Kaplan-Meier
KM DAYSGOAL BY GROUP /STATUS=GOALIN14(1) /PRINT TABLE MEAN /PLOT SURVIVAL HAZARD /TEST LOGRANK BRESLOW TARONE /COMPARE OVERALL POOLED .
Estimation is limited to the largest survival time if it is censored.a.
Survival Analysis – 50 Printed on 10/26/2016
Overall Comparisons
8.479 1 .004
9.588 1 .002
9.306 1 .002
Log Rank (Mantel-Cox)
Breslow (GeneralizedWilcoxon)
Tarone-Ware
Chi-Square df Sig.
Test of equality of survival distributions for the different levels of GROUP.
Survival Analysis – 51 Printed on 10/26/2016
The same analysis using Cox Regression
Survival Analysis – 52 Printed on 10/26/2016
One requirement of the Cox Regression analysis is that the hazard functions be proportional. That means that for any two values of a covariate, the ratio of hazards for those two values across time be constant.
This eliminates hazard functions which cross or which are parallel.
Roughly speaking the hazard function should look like the following . . .
That is, the hazard functions diverge over time.
COXREG DAYSGOAL /STATUS=GOALIN14(1) /PATTERN BY GROUP /CONTRAST (GROUP)=Indicator(1) /METHOD=ENTER GROUP /PLOT SURVIVAL HAZARD /CRITERIA=PIN(.05) POUT(.10) ITERATE(20) .
-.542 .235 5.332 1 .021 .582GROUPB SE Wald df Sig. Exp(B)
Covariate Means and Pattern Values
.483 .000 1.000GROUPMean 1 2
Pattern
The above graph presents predicted proportions. They are analogous to plots of y-hats vs. predictors in a regression analysis.
When you perform a Cox-regression analysis, you may also have to run a Kaplan-Meier analysis just for the observed survival curves the K-M procedure produces.