Chapter 37 The LIFETEST Procedure Chapter Table of Contents OVERVIEW ................................... 1799 GETTING STARTED .............................. 1800 SYNTAX ..................................... 1808 PROC LIFETEST Statement .......................... 1809 BY Statement .................................. 1815 FREQ Statement ................................ 1816 ID Statement .................................. 1816 STRATA Statement ............................... 1816 TEST Statement ................................. 1817 TIME Statement ................................ 1818 DETAILS ..................................... 1818 Missing Values ................................. 1818 Computational Formulas ............................ 1818 Output Data Sets ................................ 1825 Computer Resources .............................. 1826 Displayed Output ................................ 1827 ODS Table Names ............................... 1830 EXAMPLES ................................... 1831 Example 37.1 Product-Limit Estimates and Tests of Association for the VA Lung Cancer Data ....................... 1831 Example 37.2 Life Table Estimates for Males with Angina Pectoris ...... 1845 REFERENCES .................................. 1851
56
Embed
Chapter 37 The LIFETEST · PDF fileChapter 37 The LIFETEST Procedure ... Example 37.1 Product-Limit Estimates and Tests ... An important task in the analysis of survival data is the
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
A common feature of lifetime or survival data is the presence of right-censored ob-servations due either to withdrawal of experimental units or to termination of theexperiment. For such observations, you know only that the lifetime exceeded a givenvalue; the exact lifetime remains unknown. Such data cannot be analyzed by ignoringthe censored observations because, among other considerations, the longer-lived unitsare generally more likely to be censored. The analysis methodology must correctlyuse the censored observations as well as the noncensored observations. Several textsthat discuss the survival analysis methodology are Collett (1994), Cox and Oakes(1984), Kalbfleish and Prentice (1980), Lawless (1982), and Lee (1992).
Usually, a first step in the analysis of survival data is the estimation of the distribu-tion of the survival times. Survival times are often calledfailure times, andeventtimes are uncensored survival times. The survival distribution function (SDF), alsoknown as the survivor function, is used to describe the lifetimes of the population ofinterest. The SDF evaluated att is the probability that an experimental unit from thepopulation will have a lifetime exceedingt, that is
S(t) = Pr(T > t)
whereS(t) denotes the survivor function andT is the lifetime of a randomly selectedexperimental unit. The LIFETEST procedure can be used to compute nonparametricestimates of the survivor function either by the product-limit method (also called theKaplan-Meier method) or by the life table method.
Some functions closely related to the SDF are the cumulative distribution function(CDF), the probability density function (PDF), and the hazard function. The CDF,denotedF (t), is defined as1 � S(t) and is the probability that a lifetime does notexceedt. The PDF, denotedf(t), is defined as the derivative ofF (t), and the hazardfunction, denotedh(t), is defined asf(t)=S(t). If the life table method is chosen,the estimates of the probability density function and the hazard function can also becomputed. Plots of these estimates can be produced by a graphical or line printerdevice.
An important task in the analysis of survival data is the comparison of survival curves.It is of interest to determine whether two or more samples have arisen from identi-cal survivor functions. PROC LIFETEST provides two rank tests and a likelihoodratio test for testing the homogeneity of survival functions across strata. The ranktests are censored-data generalizations of the Savage (exponential scores) test and theWilcoxon test. The generalized Savage test is also known as the log-rank test, whilethe generalized Wilcoxon test is simply referred to as the Wilcoxon test. The likeli-
1800 � Chapter 37. The LIFETEST Procedure
hood ratio test is based on an underlying exponential model, whereas the rank testsare not.
Often there are prognostic variables called covariates that are thought to be related tothe failure time. These variables can be used to define strata, and the resulting SDFestimates can be compared visually or by using the tests of homogeneity of strata.The variables can also be used to construct statistics to test for association betweenthe covariates and the lifetime variable. PROC LIFETEST can compute two suchtest statistics: censored data linear rank statistics based on the exponential scores andthe Wilcoxon scores. The corresponding tests are known as the log-rank test and theWilcoxon test, respectively. These tests are computed by pooling over any definedstrata, thus adjusting for the stratum variables. Except for a difference in the treatmentof ties, these two rank tests are the same as those used to test for homogeneity overstrata.
Getting Started
You can use the LIFETEST procedure to compute nonparametric estimates of thesurvivor function and to compute rank tests for association of the response variablewith other variables.
For simple analyses, only the PROC LIFETEST and TIME statements are required.Consider a sample of survival data. Suppose that the time variable ist and the cen-soring variable isc with value 1 indicating censored observations. The followingstatements compute the product-limit estimate for the sample:
proc lifetest;time t*c(1);
run;
You can use the STRATA statement to divide the data into various strata. A separatesurvivor function is then estimated for each stratum, and tests of the homogeneityof strata are performed. You can specify covariates in the TEST statement. PROCLIFETEST computes linear rank statistics to test the effects of these covariates onsurvival.
For example, consider the results of a small randomized trial on rats. Suppose youassign forty rats exposed to a carcinogen into two treatment groups. The event ofinterest is death from cancer induced by the carcinogen. The response is the timefrom randomization to death. Four rats died of other causes; their survival times areregarded as censored observations. Interest lies in whether the survival distributionsdiffer between the two treatments.
The data setExposed contains four variables:Days (survival time in days fromtreatment to death),Status (censoring indicator variable: 0 if censored and 1 if notcensored),Treatment (treatment indicator), andSex (gender: F if female and M ifmale).
SAS OnlineDoc: Version 8
Getting Started � 1801
data Exposed;input Days Status Treatment Sex $ @@;datalines;
179 1 1 F 378 0 1 M256 1 1 F 355 1 1 M262 1 1 M 319 1 1 M256 1 1 F 256 1 1 M255 1 1 M 171 1 1 F224 0 1 F 325 1 1 M225 1 1 F 325 1 1 M287 1 1 M 217 1 1 F319 1 1 M 255 1 1 F264 1 1 M 256 1 1 F237 0 2 F 291 1 2 M156 1 2 F 323 1 2 M270 1 2 M 253 1 2 M257 1 2 M 206 1 2 F242 1 2 M 206 1 2 F157 1 2 F 237 1 2 M249 1 2 M 211 1 2 F180 1 2 F 229 1 2 F226 1 2 F 234 1 2 F268 0 2 M 209 1 2 F;
PROC LIFETEST is invoked to compute the product-limit estimate of the survivorfunction for each treatment and to compare the survivor functions between the twotreatments. In the TIME statement, the survival time variable,Days, is crossed withthe censoring variable,Status, with the value 0 indicating censoring. That is, thevalues ofDays are considered censored if the corresponding values ofStatus are0; otherwise, they are considered as event times. In the STRATA statement, thevariableTreatment is specified, which indicates that the data are to be divided intostrata based on the values ofTreatment. PROC LIFETEST computes the product-limit estimate for each stratum and tests whether the survivor functions are identicalacross strata.
The PLOTS= option in the PROC LIFETEST statement is used to request a plot ofthe estimated survivor function against time (by specifying S), a plot of the negativelog of the estimated survivor function against time (by specifying LS), and a plot ofthe log of the negative log of the estimated survivor function against log time (byspecifying LLS). The LS and LLS plots provide an empirical check of the appropri-ateness of the exponential model and the Weibull model, respectively, for the survivaldata (Kalbfleisch and Prentice 1980, Chapter 2).
SAS OnlineDoc: Version 8
1802 � Chapter 37. The LIFETEST Procedure
If the exponential model is appropriate, the LS curve should be approximately linearthrough the origin. If the Weibull model is appropriate, the LLS curve should beapproximately linear. Since there are more than one stratum, the LLS plot may alsobe used to check the proportional hazards model assumption. Under this assumption,the LLS curves should be approximately parallel across strata.
The results of the analysis are displayed in the following figures.
Figure 37.1 displays the product-limit survival estimate for the first stratum (Treat-ment=1). The figure lists, for each observed time, the survival estimate, failure rate,standard error of the estimate, number of failures, and number of subjects remainingin the study.
NOTE: The marked survival times are censored observations.
Figure 37.1. Product-Limit Survivor Function Estimate for Treatment=1
Figure 37.2 displays summary statistics of survival times forTreatment=1. It con-tains estimates of the 25th, 50th, and 75th percentiles and the corresponding 95%confidence limits.
The median survival time for rats in this treatment is 256 days. The mean and standarderror are also displayed; however, it is noted that these values are underestimatedbecause the largest observed time is censored and the estimation is restricted to thelargest event time.
SAS OnlineDoc: Version 8
Getting Started � 1803
The LIFETEST Procedure
Quartile Estimates
Point 95% Confidence IntervalPercent Estimate [Lower Upper)
NOTE: The mean survival time and its standard error were underestimated becausethe largest observation was censored and the estimation was restricted to the
largest event time.
Figure 37.2. Summary Statistics of Survival Times for Treatment=1
NOTE: The marked survival times are censored observations.
Figure 37.3. Product-Limit Survivor Function Estimate for Treatment=2
Figure 37.3 and Figure 37.4 display the survival estimates and the summary statisticsof the survival times forTreatment=2. The median survival time for rats in thistreatment is 235 days.
SAS OnlineDoc: Version 8
1804 � Chapter 37. The LIFETEST Procedure
The LIFETEST Procedure
Quartile Estimates
Point 95% Confidence IntervalPercent Estimate [Lower Upper)
Figure 37.4. Survival Times Summary for Treatment=2
A summary of the number of censored and event observations is shown in Figure37.5. The figure lists, for each stratum, the number of event and censored observa-tions, and the percentage of censored observations.
The LIFETEST Procedure
Summary of the Number of Censored and Uncensored Values
PercentStratum Treatment Total Failed Censored Censored
Figure 37.5. Summary of Censored and Uncensored Values
Figure 37.6 displays the graph of the product-limit survivor function estimates versussurvival time. The two treatments differ primarily at larger survival times.
SAS OnlineDoc: Version 8
Getting Started � 1805
Figure 37.6. Product-Limit Survivor Functions
Figure 37.7 displays the graph of the log survival function estimates versus survivaltime for the two treatments. Neither curve approximates a straight line through theorigin; therefore, the exponential model is not appropriate for the survival data.
SAS OnlineDoc: Version 8
1806 � Chapter 37. The LIFETEST Procedure
Figure 37.7. Log Survivor Function Estimates
Figure 37.8 displays the graph of the negative log-log survivor function estimatesversus log time for the two treatments.
Figure 37.8. Log of Negative Log Survivor Function Estimates
Results of the comparison of survival curves between the two treatments are shown inFigure 37.9. The rank tests for homogeneity indicate a significant difference betweenthe treatments (p=0.0175 for the log-rank test andp=0.0249 for the Wilcoxon test).Rats inTreatment=1 live significantly longer than those inTreatment=2. The log-rank test, which places more weight on larger survival times, is more significant thanthe Wilcoxon test, which places more weight on early survival times. As noted earlier,the exponential model is not appropriate for the given survival data; consequently, theresult of the likelihood ratio test should be ignored.
Next, suppose that gender is thought to be related to survival time, and you want tostudy the treatment effect while adjusting for the gender of the rats. By specifying thevariableSex in the STRATA statement and by specifying the variableTreatmentin the TEST statement, you can test the effect ofTreatment while adjusting for theeffect of Sex. The log-rank and Wilcoxon linear rank statistics are computed bypooling over the strata defined by the values ofSex, thus adjusting for the effect ofSex.
The NOTABLE option is added to the PROC LIFETEST statement to avoid estimat-ing a survival curve for each gender.
Results of the linear rank tests are shown in Figure 37.10. The treatment effect isstatistically significant for both the Wilcoxon test (p=0.0147) and the log-rank test(p=0.0075). As compared to the results of the homogenity test in Figure 37.9, thesignificance of the treatment effect has been sharpened by controlling for the effectof the gender of the subjects.
SAS OnlineDoc: Version 8
1808 � Chapter 37. The LIFETEST Procedure
The LIFETEST Procedure
Univariate Chi-Squares for the Wilcoxon Test
Test Standard Pr >Variable Statistic Deviation Chi-Square Chi-Square
Treatment -4.2372 1.7371 5.9503 0.0147
Univariate Chi-Squares for the Log-Rank Test
Test Standard Pr >Variable Statistic Deviation Chi-Square Chi-Square
Treatment -6.8021 2.5419 7.1609 0.0075
Figure 37.10. Tests for Association of Time with Covariates
Syntax
The following statements are available in PROC LIFETEST:
The simplest use of PROC LIFETEST is to request the nonparametric estimates ofthe survivor function for a sample of survival times. In such a case, only the PROCLIFETEST statement and the TIME statement are required. You can use the STRATAstatement to divide the data into various strata. A separate survivor function is thenestimated for each stratum, and tests of the homogeneity of strata are performed. Youcan specify covariates in the TEST statement. PROC LIFETEST computes linearrank statistics to test the effects of these covariates on survival.
The PROC LIFETEST statement invokes the procedure. All statements except theTIME statement are optional, and there is no required order for the statements fol-lowing the PROC LIFETEST statement. The TIME statement is used to specify thevariables that define the survival time and censoring indicator. The STRATA state-ment specifies a variable or set of variables defining the strata for the analysis. TheTEST statement specifies a list of numeric covariates to be tested for their associa-tion with the response survival time. Each variable is tested individually, and a jointtest statistic is also computed. The ID statement provides a list of variables whosevalues are used to identify observations in the product-limit estimates of the survivalfunction. When only the TIME statement appears, no strata are defined and no testsof homogeneity are performed.
SAS OnlineDoc: Version 8
PROC LIFETEST Statement � 1809
PROC LIFETEST Statement
PROC LIFETEST < options > ;
The PROC LIFETEST statement invokes the procedure. The following options canappear in the PROC LIFETEST statement and are described in alphabetic order. Ifno options are requested, PROC LIFETEST computes and displays product-limit es-timates of the survival distribution within each stratum and tests the equality of thesurvival functions across strata.
Task Options DescriptionSpecify Data Set DATA= specifies the input SAS data set
OUTSURV= names an output data set to contain survivalestimates and confidence limits
OUTTEST= names an output data set to contain rank teststatistics for association of survival time withcovariates limits
Specify Model ALPHA= sets confidence level for survival estimatesALPHAQT= sets confidence level for survival time
quartilesINTERVALS= specifies interval endpoints for life table
calculationsMAXTIME= sets maximum value of time variable for plotMETHOD= specifies method to compute survivor
functionMISSING allows missing values to be a stratum levelNINTERVAL= specifies number of intervals for life table
estimatesSINGULAR= sets tolerance for testing singularity of covari-
ance matrix of rank statisticsTIMELIM= specifies the time limit used to estimate the
mean survival time and its standard errorWIDTH= specifies width of intervals for life table
estimates
Control Output CENSOREDSYMBOL= defines symbol used for censored observa-tions in plots
EVENTSYMBOL= specifies symbol used for event observationsin plots
FORMCHAR(1,2,7,9)= defines characters used for line printer plotaxes
LINEPRINTER specifies that plots are produced by lineprinter
NOCENSPLOT suppresses the plot of censored observationsNOPRINT suppresses display of outputNOTABLE suppresses display of survival function
estimates
SAS OnlineDoc: Version 8
1810 � Chapter 37. The LIFETEST Procedure
Table 37.0. (continued)
Task Options DescriptionPLOTS= plots survival estimatesREDUCEOUT specifies that only INTERVAL= or
TIMELIST= observations are listed inthe OUTSURV= data set
TIMELIST= specifies a list of time points at which theKaplan-Meier estimates are displayed
Enhance Graphical Output ANNOTATE= specifies an annotate data set that adds fea-tures to plots
DESCRIPTION= specifies string that appears in the descriptionfield of the PROC GREPLAY master menufor the plots
GOUT= specifies graphics catalog name for savinggraphics output
LANNOTATE= specifies an input data set that contains vari-ables for local annotation
ALPHA= valuespecifies a number between 0.0001 and 0.9999 that sets the confidence level for theconfidence intervals for the survivor function. The confidence level for the intervalis 1 - ALPHA. For example, the option ALPHA=0.05 requests a 95% confidenceinterval for the SDF at each time point. The default value is 0.05.
ALPHAQT= valuespecifies a number between 0.0001 and 0.9999 that sets the level for the confidenceintervals for the quartiles of the survival time. The confidence level for the interval is1 - ALPHAQT. For example, the option ALPHAQT=0.05 requests a 95% confidenceinterval for the quantiles of the survival time. The default value is 0.05.
ANNOTATE=SAS-data-setANNO=SAS-data-set
specifies an input data set that contains appropriate variables for annotation. TheANNOTATE= option enables you to add features (for example, labels explaining ex-treme observations) to plots produced on graphics devices. The ANNOTATE= optioncannot be used if the LINEPRINTER option is specified. The data set specified mustbe an ANNOTATE= type data set, as described inSAS/GRAPH Software: Reference.
The data set specified with the ANNOTATE= option in the PROC LIFETEST state-ment is “global” in the sense that the information in this data set is displayed on everyplot produced by a single invocation of PROC LIFETEST.
CENSOREDSYMBOL=name | ’string’CS=name | ’string’
specifies the symbol value for the censored observations. The value,name or’string’ , is the symbol value specification allowed in SAS/GRAPH software. Thedefault is CS=CIRCLE. If you want to omit plotting the censored observations,specify CS=NONE. The CENSOREDSYMBOL= option cannot be used if theLINEPRINTER option is specified.
SAS OnlineDoc: Version 8
PROC LIFETEST Statement � 1811
DATA=SAS-data-setnames the SAS data set used by PROC LIFETEST. By default, the most recentlycreated SAS data set is used.
DESCRIPTION=’string ’DES=’string ’
specifies a descriptive string of up to 40 characters that appears in the “Description”field of the graphics catalog. The description does not appear on the plots. By de-fault, PROC LIFETEST assigns a description of the form PLOT OFvnamevshname,wherevnameandhnameare the names of they variable and thex variable, respec-tively. The DESCRIPTION= option cannot be used if the LINEPRINTER option isspecified.
EVENTSYMBOL=name | ’string’ES=name | ’string’
specifies the symbol value for the event observations. The value,nameor ’string’ ,is the symbol value specification allowed in SAS/GRAPH software. The default isES=NONE. The EVENTSYMBOL= option cannot be used if the LINEPRINTERoption is specified.
FORMCHAR(1,2,7,9)=’string’defines the characters used for constructing the vertical and horizontal axes of theline printer plots. The string should be four characters. The first and second char-acters define the vertical and horizontal bars, respectively, which are also used indrawing thestepsof the product-limit survival function. The third character definesthe tick mark for the axes, and the fourth character defines the lower left cornerof the plot. If the FORMCHAR option in PROC LIFETEST is not specified, thevalue supplied, if any, with the system option FORMCHAR= is used. The defaultis FORMCHAR(1,2,7,9)=’|-+-’. Any character or hexadecimal string can be used tocustomize the plot appearance. To send the plot output to a printer with the IBMgraphics character set (1 or 2) or display it directly on your PC screen, you can usethe following hexadecimal representation
formchar(1,2,7,9)=’B3C4C5C0’x
or system option
formchar=’B3C4DAC2BFC3C5B4C0C1D9’x
Refer to the chapter titled “The PLOT Procedure,” in theSAS Procedures Guideorthe section “System Options” inSAS Language Reference: Dictionaryfor furtherinformation.
GOUT=graphics-catalogspecifies the graphics catalog for saving graphics output from PROC LIFETEST. Thedefault is WORK.GSEG. The GOUT= option cannot be used if the LINEPRINTERoption is specified. For more information, refer to the chapter titled “The GREPLAYProcedure” inSAS/GRAPH Software: Reference.
SAS OnlineDoc: Version 8
1812 � Chapter 37. The LIFETEST Procedure
INTERVALS=valuesspecifies a list of interval endpoints for the life table calculations. These endpointsmust all be nonnegative numbers. The initial interval is assumed to start at zerowhether or not zero is specified in the list. Each interval contains its lower endpointbut does not contain its upper endpoint. When this option is used with the product-limit method, it reduces the number of survival estimates displayed by displaying onlythe estimates for the smallest time within each specified interval. The INTERVALS=option can be specified in any of the following ways:
list separated by blanks intervals=1 3 5 7
list separated by commas intervals=1,3,5,7
x to y intervals=1 to 7
x to y by z intervals=1 to 7 by 1
combination of the above intervals=1,3 to 5,7
For example, the specification
intervals=5,10 to 30 by 10
produces the set of intervals
f[0; 5); [5; 10); [10; 20); [20; 30); [30;1)g
LANNOTATE= SAS-data-setLANN=SAS-data-set
specifies an input data set that contains variables for local annotation. You can usethe LANNOTATE= option to specify a different annotation for each BY group, inwhich case the BY variables must be included in the LANNOTATE= data set. TheLANNOTATE= option cannot be used if the LINEPRINTER option is specified. Thedata set specified must be an ANNOTATE= type data set, as described inSAS/GRAPHSoftware: Reference.
If there is no BY-group processing, the ANNOTATE= and LANNOTATE= optionshave the same effects.
LINEPRINTERLS
specifies that plots are produced by a line printer instead of by a graphical device.
MAXTIME=valuespecifies the maximum value of the time variable allowed on the plots so that outlyingpoints do not determine the scale of the time axis of the plots. This parameter onlyaffects the displayed plots and has no effect on the calculations.
METHOD=typespecifies the method used to compute the survival function estimates. Valid valuesfor typeare as follows.
SAS OnlineDoc: Version 8
PROC LIFETEST Statement � 1813
PL | KM specifies that product-limit (PL) or Kaplan-Meier (KM) estimatesare computed.
ACT | LIFE | LT specifies that life table (or actuarial) estimates are computed.
By default, METHOD=PL.
MISSINGallows missing values for numeric variables and blank values for character variablesas valid stratum levels. See the section “Missing Values” on page 1818 for details.
By default, PROC LIFETEST does not use observations with missing values for anystratum variables.
NINTERVAL=valuespecifies the number of intervals used to compute the life table estimates of the sur-vivor function. This parameter is overridden by the WIDTH= option or the INTER-VALS= option. When you specify the NINTERVAL= option, PROC LIFETEST triesto find an interval that results in round numbers for the endpoints. Consequently, thenumber of intervals may be different from the number requested. Use the INTER-VALS= option to control the interval endpoints. The default is NINTERVAL=10.
NOCENSPLOTNOCENS
requests that the plot of censored observations be suppressed when the PLOTS= op-tion is specified. This option is not needed when the life table method is used tocompute the survival estimates, since the plot of censored observations is not pro-duced.
NOPRINTsuppresses the display of output. This option is useful when only an output data setis needed. Note that this option temporarily disables the Output Delivery System(ODS). For more information, see Chapter 15, “Using the Output Delivery System.”
NOTABLEsuppresses the display of survival function estimates. Only the number of censoredand event times, plots, and test results are displayed.
OUTSURV=SAS-data-setOUTS=SAS-data-set
creates an output SAS data set to contain the estimates of the survival function andcorresponding confidence limits for all strata. See the section “Output Data Sets” onpage 1825 for more information on the contents of the OUTSURV= SAS data set.
OUTTEST=SAS-data-setOUTT=SAS-data-set
creates an output SAS data set to contain the overall chi-square test statistic for as-sociation with failure time for the variables in the TEST statement, the values of theunivariate rank test statistics for each variable in the TEST statement, and the esti-mated covariance matrix of the univariate rank test statistics. See the section “OutputData Sets” on page 1825 for more information on the contents of the OUTTEST=SAS data set.
SAS OnlineDoc: Version 8
1814 � Chapter 37. The LIFETEST Procedure
PLOTS= ( type <(NAME=name)> <, ..., type <(NAME=name)> > )creates plots of survival estimates or censored observations, wheretypeis the type ofplot andnameis a catalog entry name of up to eight characters. Valid values oftypeare as follows:
CENSORED | C specifies a plot of censored observations by strata.
SURVIVAL | S specifies a plot of the estimated SDF versus time.
LOGSURV | LS specifies a plot of the� log(estimated SDF) versus time.
LOGLOGS | LLS specifies a plot of thelog(� log(estimated SDF) versuslog(time).
HAZARD | H specifies a plot of the estimated hazard function versus time.
PDF | P specifies a plot of the estimated probability density function ver-sus time.
Parentheses are required in specifying the plots. For example,
plots = (s)
requests a plot of the estimated survivor function versus time, and
plots = (s(name=Surv2), h(name=Haz2))
requests a plot of the estimated survivor function versus time and a plot of the es-timated hazard function versus time, withSurv2 andHaz2 as their catalog names,respectively.
REDUCEOUTspecifies that the OUTSURV= data set contains only those observations that are in-cluded in the INTERVALS= or TIMELIST= option. This option has no effect if theOUTSURV= option is not specified. It also has no effect if neither the INTERVALS=option nor the TIMELIST= option is specified.
SINGULAR=valuespecifies the tolerance for testing singularity of the covariance matrix for the rank teststatistics. The test requires that a pivot for sweeping a covariance matrix be at leastthis number times a norm of the matrix. The default value is 1E-12.
TIMELIM=time-limitspecifies the time limit used in the estimation of the mean survival time and its stan-dard error. The mean survival time can be shown to be the area under the Kaplan-Meier survival curve. However, if the largest observed time in the data is censored,the area under the survival curve is not a closed area. In such a situation, you canchoose a time limitL and estimate the mean survival curve limited to a timeL (Lee1992, pp. 72�76). This option is ignored if the largest observed time is an eventtime. Valid time-limit values are as follows.
SAS OnlineDoc: Version 8
BY Statement � 1815
EVENT | LET specifies that the time limitL is the largest event time in thedata. TIMELIM=EVENT is the default.
OBSERVED | LOT specifies that the time limitL is the largest observed time inthe data.
number specifies that the time limitL is the givennumber. Thenumbermust be positive and at least as large as the largest event time inthe data.
TIMELIST=number-listspecifies a list of time points at which the Kaplan-Meier estimates are displayed. Thetime points are listed in the column labeled as–TIME–. Since the Kaplan-Meiersurvival curve is a decreasing step function, each given time point falls in an intervalthat has a constant survival estimate. The event time that corresponds to the beginningof the time interval is displayed along with its survival estimate.
WIDTH=valuesets the width of the intervals used in the life table calculation of the survival function.This parameter is overridden by the INTERVALS= option.
BY Statement
BY variables ;
You can specify a BY statement with PROC LIFETEST to obtain separate analyseson observations in groups defined by the BY variables.
The BY statement is more efficient than the STRATA statement for defining stratain large data sets. However, if you use the BY statement to define strata, PROCLIFETEST does not pool over strata for testing the association of survival time withcovariates nor does it test for homogeneity across the BY groups.
Interval size is computed separately for each BY group. When intervals are deter-mined by default, they may be different for each BY group. To make intervals thesame for each BY group, use the INTERVALS= option in the PROC LIFETESTstatement.
When a BY statement appears, the procedure expects the input data set to be sortedin order of the BY variables. If your input data set is not sorted in ascending order,use one of the following alternatives:
� Sort the data using the SORT procedure with a similar BY statement.
� Specify the BY statement option NOTSORTED or DESCENDING in the BYstatement for the LIFETEST procedure. The NOTSORTED option does notmean that the data are unsorted but rather that the data are arranged in groups(according to values of the BY variables) and that these groups are not neces-sarily in alphabetical or increasing numeric order.
� Create an index on the BY variables using the DATASETS procedure.
SAS OnlineDoc: Version 8
1816 � Chapter 37. The LIFETEST Procedure
For more information on the BY statement, refer to the discussion inSAS LanguageReference: Concepts. For more information on the DATASETS procedure, refer tothe discussion in theSAS Procedures Guide.
FREQ Statement
FREQ variable ;
The variable in the FREQ statement identifies a variable containing the frequencyof occurrence of each observation. PROC LIFETEST treats each observation as if itappearedn times, wheren is the value of the FREQ variable for the observation. TheFREQ statement is useful for producing life tables when the data are already in theform of a summary data set. If not an integer, the frequency value is truncated to aninteger. If the frequency value is less than one, the observation is not used.
ID Statement
ID variables ;
The ID variable values are used to label the observations of the product-limit survivalfunction estimates. SAS format statements can be used to format the values of the IDvariables.
The STRATA statement indicates which variables determine strata levels for the com-putations. The strata are formed according to the nonmissing values of the designatedstrata variables. The MISSING option can be used to allow missing values as a validstratum level.
In the preceding syntax,variable is a variable whose values determine the stratumlevels andlist is a list of endpoints for a numeric variable. The values forvariablecan be formatted or unformatted. If the variable is a character variable, or if thevariable is numeric and no list appears, then the strata are defined by the uniquevalues of the strata variable. More than one variable can be specified in the STRATAstatement, and each numeric variable can be followed by a list. Each interval containsits lower endpoint but does not contain its upper endpoint. The corresponding strataare formed by the combination of levels. If a variable is numeric and is followed bya list, then the levels for that variable correspond to the intervals defined by the list.The initial interval is assumed to start at�1 and the final interval is assumed to endat1.
SAS OnlineDoc: Version 8
TEST Statement � 1817
The STRATA statement can have any of the following forms:
list separated by blanks strata age(5 10 20 30)
list separated by commas strata age(5,10,20,30)
x to y strata age(5 to 10)
x to y by z strata age(5 to 30 by 10)
combination of the above strata age(5,10 to 50 by 10)
For example, the specification
strata age(5,20 to 50 by 10) sex;
indicates the following levels for theAge variable
This statement also specifies that the age strata is further subdivided by values of thevariableSex. In this example, there are 6 age groups by 2 sex groups, forming a totalof 12 strata.
The specification of several variables (for example,A B C) is equivalent to theA*B*C: : : syntax of the TABLES statement in the FREQ procedure. The numberof strata levels usually grows very rapidly with the number of strata variables, so youmust be cautious when specifying the STRATA list.
TEST Statement
TEST variables ;
The TEST statement specifies a list of numeric (continuous) covariates that you wanttested for association with the failure time.
Two sets of rank statistics are computed. These rank statistics and their variances arepooled over all strata. Univariate (marginal) test statistics are displayed for each ofthe covariates.
Additionally, a sequence of test statistics for joint effects of covariates is displayed.The first element of the sequence is the largest univariate test statistic. Other vari-ables are then added on the basis of the largest increase in the joint test statistic.The process continues until all the variables have been added or until the remainingvariables are linearly dependent on the previously added variables. See the section“Computational Formulas” on page 1818 for more information.
SAS OnlineDoc: Version 8
1818 � Chapter 37. The LIFETEST Procedure
TIME Statement
TIME variable < *censor(list) > ;
The TIME statement is required. It is used to indicate the failure time variable, wherevariable is the name of the failure time variable that can be optionally followed byan asterisk, the name of the censoring variable, and a parenthetical list of values thatcorrespond to right censoring. The censoring values should be numeric, nonmissingvalues. For example, the statement
time t*flag(1,2);
identifies the variableT as containing the values of the event or censored time. If thevariableFlag has value 1 or 2, the corresponding value ofT is a right-censored valueand not an observed failure time.
Details
Missing Values
Observations with a missing value for either the failure time or the censoring variableare not used in the analysis. If a stratum variable value is missing, survival functionestimates are computed for the strata labeled by the missing value, but these data arenot used in any rank tests. However, the MISSING option can be used to requestthat missing values be treated as valid stratum values. If any variable specified in theTEST statement has a missing value, that observation is not used in the calculation ofthe rank statistics.
Computational Formulas
Product-Limit MethodLet t1 < t2 < � � � < tk represent the distinct event times. For eachi = 1; : : : ; k, letni be the number of surviving units, the size of the risk set, just prior toti. Let di bethe number of units that fail atti, and letsi = ni � di.
The product-limit estimate of the SDF atti is the cumulative product
S(ti) =iY
j=1
�1�
djnj
�
Notice that the estimator is defined to be right continuous; that is, the events atti areincluded in the estimate ofS(ti). The corresponding estimate of the standard error iscomputed using Greenwood’s formula (Kalbfleish and Prentice 1980) as
��S(ti)
�= S(ti)
vuut iXj=1
djnjsj
SAS OnlineDoc: Version 8
Computational Formulas � 1819
The first sample quartile of the survival time distribution is given by
q0:25 =1
2(inf
nt : 1� S(t) � 0:25
o+ sup
nt : 1� S(t) � 0:25
o)
Confidence intervals for the quartiles are based on the sign test (Brookmeyer andCrowley 1982). The100(1��)% confidence interval for the first quartile is given by
I0:25 =nt : (1� S(t)� 0:25)2 � c��
2�S(t)
�o
wherec� is the upper� percentile of a central chi-squared distribution with 1 degreeof freedom. The second and third sample quartiles and the corresponding confidenceintervals are calculated by replacing the 0.25 in the last two equations by 0.50 and0.75, respectively.
The estimated mean survival time is
� =kX
i=1
S(ti�1)(ti � ti�1)
wheret0 is defined to be zero. If the last observation is censored, this sum underesti-mates the mean. The standard error of� is estimated as
�(�) =
vuut m
m� 1
k�1Xi=1
A2i
nisi
where
Ai =k�1Xj=i
S(tj)(tj+1 � tj)
m =kX
j=1
dj
Life Table MethodThe life table estimates are computed by counting the numbers of censored anduncensored observations that fall into each of the time intervals[ti�1; ti), i =1; 2; : : : ; k + 1, wheret0 = 0 andtk+1 =1. Letni be the number of units enteringthe interval[ti�1; ti), and letdi be the number of events occurring in the interval. Letbi = ti � ti�1, and letn0i = ni � wi=2, wherewi is the number of units censored inthe interval. Theeffective sample sizeof the interval[ti�1; ti) is denoted byn0i. Lettmi denote the midpoint of[ti�1; ti).
The conditional probability of an event in[ti�1; ti) is estimated by
SAS OnlineDoc: Version 8
1820 � Chapter 37. The LIFETEST Procedure
qi =din0i
and its estimated standard error is
� (qi) =
sqipin0i
wherepi = 1� qi.
The estimate of the survival function atti is
S(ti) =
�1 i = 0
S(ti�1)pi�1 i > 0
and its estimated standard error is
��S(ti)
�= S(ti)
vuut i�1Xj=1
qjn0j pj
The density function attmi is estimated by
f(tmi) =S(ti)qibi
and its estimated standard error is
��f(tmi)
�= f(tmi)
vuut i�1Xj=1
qjn0j pj
+pin0iqi
The estimated hazard function attmi is
h(tmi) =2qi
bi(1 + pi)
and its estimated standard error is
��h(tmi)
�= h(tmi)
s1� (bih(tmi)=2)2
n0iqi
Let [tj�1; tj) be the interval in whichS(tj�1) � S(ti)=2 > S(tj). The medianresidual lifetime atti is estimated by
Mi = tj�1 � ti + bjS(tj�1)� S(ti)=2
S(tj�1)� S(tj)
SAS OnlineDoc: Version 8
Computational Formulas � 1821
and the corresponding standard error is estimated by
�(Mi) =S(ti)
2f(tmj)pn0i
Interval DeterminationIf you want to determine the intervals exactly, use the INTERVALS= option in thePROC LIFETEST statement to specify the interval endpoints. Use the WIDTH= op-tion to specify the width of the intervals, thus indirectly determining the number ofintervals. If neither the INTERVALS= option nor the WIDTH= option is specified inthe life table estimation, the number of intervals is determined by the NINTERVAL=option. The width of the time intervals is 2, 5, or 10 times an integer (possibly a neg-ative integer) power of 10. Letc = log10(maximum event or censored time/numberof intervals), and letb be the largest integer not exceedingc. Let d = 10c�b and let
a = 2� I(d � 2) + 5� I(2 < d � 5) + 10� I(d > 5)
with I being the indicator function. The width is then given by
width = a� 10b
By default, NINTERVAL=10.
Confidence Limits Added to the Output Data SetThe upper confidence limits (UCL) and the lower confidence limits (LCL) for thedistribution estimates for both the product-limit and life table methods are computedas
UCL = �+ z�=2�
LCL = �� z�=2�
where� is the estimate (either the survival function, the density, or the hazard func-tion), � is the corresponding estimate of the standard error, andz�=2 is the criticalvalue for the normal distribution. That is,�(�z�=2) = �=2, where� is the cumula-tive distribution function for the standard normal distribution.
The value of� can be specified with the ALPHA= option.
Tests for Equality of Survival Curves across Strata
Log-Rank Test and Wilcoxon TestThe rank statistics used to test homogeneity between the strata (Kalbfleish and Pren-tice 1980) have the form of ac� 1 vectorv = (v1; v2; : : : ; vc)
0 with
vj =
kXi=1
wi
�dij �
nijdini
�
SAS OnlineDoc: Version 8
1822 � Chapter 37. The LIFETEST Procedure
wherec is the number of strata, and the estimated covariance matrix,V = (Vjl), isgiven by
Vjl =kX
i=1
w2i disi(ninil�jl � nijnil)
n2i (ni � 1)
wherei labels the distinct event times,�jl is 1 if j = l and 0 otherwise,nij is the sizeof the risk set in thejth stratum at theith event time,dij is the number of events inthejth stratum at theith time, and
ni =
cXj=1
nij
di =
cXj=1
dij
si = ni � di
The termvj can be interpreted as a weighted sum of observed minus expected num-bers of failure under the null hypothesis of identical survival curves. The weightwi
is 1 for the log-rank test andni for the Wilcoxon test. The overall test statistic forhomogeneity isv0V�
v, whereV� denotes a generalized inverse ofV. This statisticis treated as having a chi-square distribution with degrees of freedom equal to therank ofV for the purposes of computing an approximate probability level.
Likelihood Ratio TestThe likelihood ratio test statistic (Lawless 1982) for homogeneity assumes that thedata in the various strata are exponentially distributed and tests that the scale param-eters are equal. The test statistic is computed as
Z = 2N log
�T
N
�� 2
cXj=1
Nj log
�TjNj
�
whereNj is the total number of events in thejth stratum,N =Pc
j=1Nj, Tj is thetotal time on test in thejth stratum, andT =
Pcj=1 Tj . The approximate probability
value is computed by treatingZ as having a chi-square distribution withc�1 degreesof freedom.
Rank Tests for the Association of Survival Time with CovariatesThe rank tests for the association of covariates are more general cases of the ranktests for homogeneity. A good discussion of these tests can be found in Kalbfleischand Prentice (1980). In this section, the index� is used to label all observations,� = 1; 2; : : : ; n, and the indicesi; j range only over the observations that corre-spond to events,i; j = 1; 2; : : : ; k. The ordered event times are denoted ast(i), the
SAS OnlineDoc: Version 8
Computational Formulas � 1823
corresponding vectors of covariates are denoted asz(i), and the ordered times, bothcensored and event times, are denoted ast�.
The rank test statistics have the form
v =
nX�=1
c�;��z�
wheren is the total number of observations,c�;�� are rank scores, which can be eitherlog-rank or Wilcoxon rank scores,�� is 1 if the observation is an event and 0 if theobservation is censored, andz� is the vector of covariates in the TEST statement forthe�th observation. Notice that the scores,c�;�� , depend on the censoring patternand that the summation is over all observations.
The log-rank scores are
c�;�� =X
(j:t(j)�t�)
�1
nj� ��
�
and the Wilcoxon scores are
c�;�� = 1� (1 + ��)Y
(j:t(j)�t�)
njnj + 1
wherenj is the number at risk just prior tot(j).
The estimates used for the covariance matrix of the log-rank statistics are
V =kX
i=1
Vi
ni
whereVi is the corrected sum of squares and crossproducts matrix for the risk set attime t(i); that is,
Vi =X
(�:t��t(i))
(z� � �zi)0(z� � �zi)
where
�zi =X
(�:t��t(i))
z�
ni
The estimate used for the covariance matrix of the Wilcoxon statistics is
V =
kXi=1
24ai(1� a�i )(2z(i)z
0(i) + Si)� (a�i � ai)
0@aixix0i + kX
j=i+1
aj(xix0j + xjx
0i)
1A35
SAS OnlineDoc: Version 8
1824 � Chapter 37. The LIFETEST Procedure
where
ai =iY
j=1
njnj + 1
a�i =iY
j=1
nj + 1
nj + 2
Si =X
(�:t(i+1)>t�>t(i))
z�z0�
xi = 2z(i) +X
(�:t(i+1)>t�>t(i))
z�
In the case of tied failure times, the statisticsv are averaged over the possible or-derings of the tied failure times. The covariance matrices are also averaged over thetied failure times. Averaging the covariance matrices over the tied orderings producesfunctions with appropriate symmetries for the tied observations; however, the actualvariances of thev statistics would be smaller than the preceding estimates. Unlessthe proportion of ties is large, it is unlikely that this will be a problem.
The univariate tests for each covariate are formed from each component ofv andthe corresponding diagonal element ofV asv2i =Vii. These statistics are treated ascoming from a chi-square distribution for calculation of probability values.
The statisticv0V�v is computed by sweeping each pivot of theV matrix in the order
of greatest increase to the statistic. The corresponding sequence of partial statisticsis tabulated. Sequential increments for including a given covariate and the corre-sponding probabilities are also included in the same table. These probabilities arecalculated as the tail probabilities of a chi-square distribution with one degree of free-dom. Because of the selection process, these probabilities should not be interpretedasp-values.
If desired for data screening purposes, the output data set requested by theOUTTEST= option can be treated as a sum of squares and crossproducts matrix andprocessed by the REG procedure using the option METHOD=RSQUARE. Then thesets of variables of a given size can be found that give the largest test statistics. Ex-ample 37.1 illustrates this process.
SAS OnlineDoc: Version 8
Output Data Sets � 1825
Output Data Sets
OUTSURV= Data SetThe OUTSURV= option in the LIFETEST statement creates an output data set con-taining survival estimates. It contains
� any specified BY variables
� any specified STRATA variables, their values coming from either their originalvalues or the midpoints of the stratum intervals if endpoints are used to definestrata (semi-infinite intervals are labeled by their finite endpoint)
� –STRTUM– , a numeric variable that numbers the strata
� the time variable as given in the TIME statement. In the case of the product-limit estimates, it contains the observed failure or censored times. For the lifetable estimates, it contains the lower endpoints of the time intervals.
� SURVIVAL, a variable containing the survival function estimates
� SDF–LCL, a variable containing the lower endpoint of the survival confidenceinterval
� SDF–UCL, a variable containing the upper endpoint of the survival confidenceinterval
If the estimation uses the product-limit method, then the data set also contains
� –CENSOR– , an indicator variable that has a value 1 for a censored observa-tion and a value 0 for an event observation
If the estimation uses the life table method, then the data set also contains
� MIDPOINT, a variable containing the value of the midpoint of the time interval
� PDF, a variable containing the density function estimates
� PDF–LCL, a variable containing the lower endpoint of the PDF confidenceinterval
� PDF–UCL, a variable containing the upper endpoint of the PDF confidenceinterval
� HAZARD, a variable containing the hazard estimates
� HAZ–LCL, a variable containing the lower endpoint of the hazard confidenceinterval
� HAZ–UCL, a variable containing the upper endpoint of the hazard confidenceinterval
SAS OnlineDoc: Version 8
1826 � Chapter 37. The LIFETEST Procedure
Each survival function contains an initial observation with the value 1 for the SDFand the value 0 for the time. The output data set contains an observation for eachdistinct failure time if the product-limit method is used or an observation for eachtime interval if the life table method is used. The product-limit survival estimates aredefined to be right continuous; that is, the estimates at a given time include the factorfor the failure events that occur at that time.
Labels are assigned to all the variables in the output data set except the BY variableand the STRATA variable.
OUTTEST= Data SetThe OUTTEST= option in the LIFETEST statement creates an output data set con-taining the rank statistics for testing the association of failure time with covariates. Itcontains
� any specified BY variables
� –TYPE– , a character variable of length 8 that labels the type of rank test,either “LOG-RANK” or “WILCOXON”
� –NAME– , a character variable of length 8 that labels the rows of the covari-ance matrix and the test statistics
� the TIME variable, containing the overall test statistic in the observation thathas–NAME– equal to the name of the time variable and the univariate teststatistics under their respective covariates.
� all variables listed in the TEST statement
The output is in the form of a symmetric matrix formed by the covariance matrix ofthe rank statistics bordered by the rank statistics and the overall chi-square statistic.If the value of–NAME– is the name of a variable in the TEST statement, the ob-servation contains a row of the covariance matrix and the value of the rank statisticin the time variable. If the value of–NAME– is the name of the TIME variable, theobservation contains the values of the rank statistics in the variables from the TESTlist and the value of the overall chi-square test statistic in the TIME variable.
Two complete sets of statistics labeled by the–TYPE– variable are produced, onefor the log-rank test and one for the Wilcoxon test.
Computer Resources
The data are first read and sorted into strata. If the data are originally sorted byfailure time and censoring state, with smaller failure times coming first and eventvalues preceding censored values in cases of ties, the data can be processed by stratawithout additional sorting. Otherwise, the data are read into memory by strata andsorted.
SAS OnlineDoc: Version 8
Displayed Output � 1827
Memory RequirementsFor a given BY group, define
N the total number of observations
V the number of STRATA variables
C the number of covariates listed on the TEST statement
L total length of the ID variables in bytes
S number of strata
n maximum number of observations within strata
b 12 + 8C + L
m1 (112 + 16V )� S
m2 50� b� S
m3 (50 + n)� (b+ 4)
m4 8(C + 4)2
m5 20N + 8S � (S + 4)
The memory, in bytes, required to process the BY-group is at least
m1 +max(m2;m3) +m4
The test of equality of survival functions across strata requires additional memory(m5 bytes). However, if this additional memory is not available, PROC LIFETESTskips the test for equality of survival functions and finishes the other computations.Additional memory is required for the PLOTS= option. Temporary storage of16nbytes is required to store the product-limit estimates for plotting.
Displayed Output
For each stratum, the LIFETEST procedure displays
the values of the stratum variables, if you specify the STRATA statement.
The following items are displayed when you request product-limit estimates:
� the observed event or censored times
� the estimate of the survival function
� the estimate of the cumulative distribution function of the failure time
� the standard error estimate of the estimated survival function
� the number of event times that have been observed
� the number of event or censored times which remain to be observed
SAS OnlineDoc: Version 8
1828 � Chapter 37. The LIFETEST Procedure
� the frequency of the observed event or censored times if you specify the FREQstatement
� the values of the ID variables if you specify the ID statement
� the sample quartiles of the survival times
� the estimated mean survival time
� the estimated standard error of the estimated mean
The following items are displayed when you request life table estimates:
� time intervals into which the failure and censored times are distributed; eachinterval is from the lower limit, up to but not including the upper limit. If theupper limit is infinity, the missing value is printed.
� the number of events that occur in the interval
� the number of censored observations that fall into the interval
� the effective sample size for the interval
� the estimate of conditional probability of events (failures) in the interval
� the standard error of the estimated conditional probability of events
� the estimate of the survival function at the beginning of the interval
� the estimate of the cumulative distribution function of the failure time at thebeginning of the interval
� the standard error estimate of the estimated survival function
� the estimate of the median residual lifetime which is the amount of time elapsedbefore reducing the number of at-risk units to one-half. This is also known asthe it median future lifetime in Johnson and Johnson (1980).
� the estimated standard error of the estimated median residual lifetime
� the density function estimated at the midpoint of the interval
� the standard error estimate of the estimated density
� the hazard rate estimated at the midpoint of the interval
� the standard error estimate of the estimated hazard
The following results, summarized over all strata, are displayed:
� a summary of the number of censored and event times
� a table of rank statistics for testing homogeneity over strata. For each stratum,the log rank statistic can be interpreted as the difference between the observednumber of failures and the expected numbers of failures under the null hypoth-esis of identical survival function.
� the covariance matrix for the log rank statistics for testing homogeneity overstrata
SAS OnlineDoc: Version 8
Displayed Output � 1829
� the covariance matrix for the Wilcoxon statistics for testing homogeneity overstrata
� the approximate chi-square statistic for the log rank test, computed as aquadratic form of the log rank statistics (seeComputational Formulas)
� the approximate chi-square statistic for the Wilcoxon test
� the likelihood ratio test for homogeneity over strata based on the exponentialdistribution
You can generate plots for
� the estimated SURVIVAL FUNCTION against FAILURE TIME
� the�log(estimated SURVIVAL FUNCTION) against FAILURE TIME
� the log(�log(estimated SURVIVAL FUNCTION)) against log(FAILURETIME)
� censored observations for each stratum if the product-limit estimation methodwas used.
If you request the life table estimation method, you can also generate plots for theestimated HAZARD against FAILURE TIME and the estimated DENSITY againstFAILURE TIME.
If you specify the TEST statement, the following statistics are printed:
� the univariate Wilcoxon statistics
� the standard deviations of the Wilcoxon statistics
� the corresponding approximate chi-square statistics
� the approximate probability values of the univariate chi-square statistics
� the covariance matrix for the Wilcoxon statistics
� the sequence of partial chi-square statistics for the Wilcoxon test in the orderof the greatest increase to the overall test statistic
� the approximate probability values of the partial chi-square statistics
� the chi-square increments for including the given covariate
� the probability values of the chi-square increments. SeeComputational For-mulasearlier in this chapter for a warning concerning these probabilities.
� the univariate log rank statistics
� the standard deviations of the log rank statistics
� the corresponding approximate chi-square statistics
� the approximate probability values of the univariate chi-square statistics
� the covariance matrix for the log rank statistics
� the sequence of partial chi-square statistics for the log rank test in the order ofthe greatest increase to the overall test statistic
SAS OnlineDoc: Version 8
1830 � Chapter 37. The LIFETEST Procedure
� the approximate probability values of the partial chi-square statistics
� the chi-square increments for including the given covariate
� the probability values of the chi-square increments. SeeComputational For-mulasearlier in this chapter for a warning concerning these probabilities
ODS Table Names
PROC LIFETEST assigns a name to each table it creates. You can use these namesto reference the table when using the Output Delivery System (ODS) to select tablesand create output data sets. These names are listed in the following table. For moreinformation on ODS, see Chapter 15, “Using the Output Delivery System.”
Table 37.1. ODS Tables Produced in PROC LIFETEST
ODS Table Name Description Statement OptionCensorPlot Line-printer plot of censored
observationsPROC PLOT=(C) and METHOD=PL
and LINEPRINTERCensoredSummary Number of event and cen-
sored observationsPROC METHOD=PL (default)
DensityPlot Line-printer plot of thedensity
PROC PLOT=(D) and METHOD=LTand LINEPRINTER
HazardPlot Line-printer plot of the haz-ards function
PROC PLOT=(H) and METHOD=LTand LINEPRINTER
HomStats Rank statistics for testingstrata homogeneity
Example 37.1. Product-Limit Estimates and Tests ofAssociation for the VA Lung Cancer Data
This example uses the data presented in Appendix I of Kalbfleisch and Prentice(1980). The response variable,SurvTime, is the survival time in days of a lungcancer patient. Negative values ofSurvTime are censored values. The covariatesareCell (type of cancer cell),Therapy (type of therapy: standard or test),Prior(prior therapy: 0=no, 10=yes),Age (age in years),DiagTime (time in months fromdiagnosis to entry into the trial), andKps (performance status). A censoring indi-cator variableCensor is created from the data, with value 1 indicating a censoredtime and value 0 an event time. Since there are only two types of therapy, an indi-cator variable,Treatment, is constructed for therapy type, with value 0 for standardtherapy and value 1 for test therapy.
options ls=120;data VALung;
drop check m;retain Therapy Cell;infile cards column=column;length Check $ 1;label SurvTime=’failure or censoring time’
Kps=’karnofsky index’DiagTime=’months till randomization’Age=’age in years’Prior=’prior treatment?’Cell=’cell type’
SAS OnlineDoc: Version 8
1832 � Chapter 37. The LIFETEST Procedure
Therapy=’type of treatment’Treatment=’treatment indicator’;
M=Column;input Check $ @@;if M>Column then M=1;if Check=’s’|Check=’t’ then input @M Therapy $ Cell $ ;else input @M SurvTime Kps DiagTime Age Prior @@;if SurvTime > .;censor=(SurvTime<0);SurvTime=abs(SurvTime);Treatment=(Therapy=’test’);datalines;
Example 37.1. Product-Limit Estimates and Tests of Association for the VA LungCancer Data � 1833
PROC LIFETEST is invoked to compute the product-limit estimate of the survivorfunction for each type of cancer cell and to analyze the effects of the variablesAge,Prior, DiagTime, Kps, andTreatment on the survival of the patients. These prog-nostic factors are specified in the TEST statement, and the variableCell is specifiedin the STRATA statement. Graphs of the product-limit estimates, the log estimates,and the negative log-log estimates are requested through the PLOTS= option in thePROC LIFETEST statement. Because of a few large survival times, a MAXTIMEof 600 is used to set the scale of the time axis; that is, the time scale extends from0 to a maximum of 600 days in the plots. The variableTherapy is specified in theID statement to identify the type of therapy for each observation in the product-limitestimates. The OUTTEST option specifies the creation of an output data set namedTest to contain the rank test matrices for the covariates.
title ’VA Lung Cancer Data’;symbol1 c=blue ; symbol2 c=orange; symbol3 c=green;symbol4 c=red; symbol5 c=cyan; symbol6 c=black;proc lifetest plots=(s,ls,lls) outtest=Test maxtime=600;
time SurvTime*Censor(1);id Therapy;strata Cell;test Age Prior DiagTime Kps Treatment;
run;
Output 37.1.1 through Output 37.1.5 display the product-limit estimates of the sur-vivor functions for the four cell types. Summary statistics of the survival times arealso shown. The median survival times are 51 days, 156 days, 51 days, and 118 daysfor patients with adeno cells, large cells, small cells, and squamous cells, respectively.
SAS OnlineDoc: Version 8
1834 � Chapter 37. The LIFETEST Procedure
Output 37.1.1. Product-Limit Survival Estimate for Cell=adeno
VA Lung Cancer Data
The LIFETEST Procedure
Stratum 1: Cell = adeno
Product-Limit Survival Estimates
SurvivalStandard Number Number
SurvTime Survival Failure Error Failed Left Therapy
Output 37.1.5 displays a summary of the number of censored and event observationsby cell type.
The graph of the estimated survivor functions is shown in Output 37.1.6. The adenocell curve and the small cell curve are much closer to each other than to the largecell curve or the squamous cell curve. The survival rates of the adeno cell patientsand the small cell patients decrease rapidly to approximately 29% in 90 days. Shapesof the large cell curve and the squamous cell curve are quite different, although bothdecrease less rapidly than those of the adeno and small cells. The squamous cell curvedecreases more rapidly initially than the large cell curve, but the role is reversed inthe later period.
SAS OnlineDoc: Version 8
1840 � Chapter 37. The LIFETEST Procedure
Output 37.1.6. Graph of the Estimated Survivor Functions
Output 37.1.7 displays the graph of the log of the estimated survivor functions andOutput 37.1.8 displays the log of the negative log of the estimated survivor functions.
SAS OnlineDoc: Version 8
Example 37.1. Product-Limit Estimates and Tests of Association for the VA LungCancer Data � 1841
Output 37.1.7. Graph of the Log of the Estimated Survivor Functions
Output 37.1.8. Graph of the Negative Log-Log of the Estimated Survivor Functions
Results of the homogeneity tests across cell types are given in Output 37.1.9. Thelog-rank and Wilcoxon statistics and their corresponding covariance matrices are dis-played. Also given is a table that consists of the approximate chi-square statistics,degrees of freedom, andp-values for the log-rank, Wilcoxon, and likelihood ratiotests. All three tests indicate strong evidence of a significant difference among thesurvival curves for the four types of cancer cells (p < 0.001).
SAS OnlineDoc: Version 8
Example 37.1. Product-Limit Estimates and Tests of Association for the VA LungCancer Data � 1843
Output 37.1.10. Log-Rank Rank Test of the Prognostic Factors
VA Lung Cancer Data
The LIFETEST Procedure
Univariate Chi-Squares for the Log-Rank Test
Test Standard Pr >Variable Statistic Deviation Chi-Square Chi-Square Label
Age -40.7383 105.7 0.1485 0.7000 age in yearsPrior -19.9435 46.9836 0.1802 0.6712 prior treatment?DiagTime -115.9 97.8708 1.4013 0.2365 months till randomizationKps 1123.1 170.3 43.4747 <.0001 karnofsky indexTreatment -4.2076 5.0407 0.6967 0.4039 treatment indicator
Results of the log-rank test of the prognostic variables are shown in Output 37.1.10.The univariate test results correspond to testing each prognostic factor marginally.The joint covariance matrix of these univariate test statistics is also displayed. Incomputing the overall chi-square statistic, the partial chi-square statistics following aforward stepwise entry approach are tabulated.
Consider the log-rank test in Output 37.1.10. Since the univariate test forKps hasthe largest chi-square (43.4747) among all the covariates,Kps is entered first. At thisstage, the partial chi-square and the chi-square increment forKps are the same asthe univariate chi-square. Among all the covariates not in the model (Age, Prior, Di-agTime, Treatment), Treatment has the largest approximate chi-square increment(1.7261) and is entered next. The approximate chi-square for the model containingKps andTreatment is 43.4747+1.7261=45.2008 with 2 degrees of freedom. Thethird covariate entered isAge. The fourth isPrior, and the fifth isDiagTime . Theoverall chi-square statistic on the last line of output is the partial chi-square for in-cluding all the covariates. It has a value of 46.4200 with 5 degrees of freedom, whichis highly significant (p < 0.0001).
You can establish this forward stepwise entry of prognostic factors by passing thematrix corresponding to the log-rank test to the RSQUARE method in the REG pro-cedure. PROC REG finds the sets of variables that yield the largest chi-square statis-tics.
/ selection=rsquare;title ’All Possible Subsets of Covariables for the
log-rank Test’;run;
Output 37.1.11 displays the univariate statistics and their covariance matrix. Resultsof the best subset regression are shown in Output 37.1.12. The variableKps generatesthe largest univariate test statistic among all the covariates, the pairKps andAgegenerate the largest test statistic among any other pairs of covariates, and so on. Theentry order of covariates is identical to that of PROC LIFETEST.
Output 37.1.11. Log-Rank Statistics and Covariance Matrix
Obs _TYPE_ _NAME_ SurvTime Age Prior DiagTime Kps Treatment
----------------------------------------------------------2 0.9737 Kps Treatment2 0.9472 Age Kps2 0.9417 Prior Kps2 0.9382 DiagTime Kps2 0.0434 DiagTime Treatment2 0.0353 Age DiagTime2 0.0304 Prior DiagTime2 0.0181 Prior Treatment2 0.0159 Age Treatment2 0.0075 Age Prior
----------------------------------------------------------3 0.9974 Age Kps Treatment3 0.9774 Prior Kps Treatment3 0.9747 DiagTime Kps Treatment3 0.9515 Age Prior Kps3 0.9481 Age DiagTime Kps3 0.9418 Prior DiagTime Kps3 0.0456 Age DiagTime Treatment3 0.0438 Prior DiagTime Treatment3 0.0355 Age Prior DiagTime3 0.0192 Age Prior Treatment
----------------------------------------------------------4 0.9999 Age Prior Kps Treatment4 0.9976 Age DiagTime Kps Treatment4 0.9774 Prior DiagTime Kps Treatment4 0.9515 Age Prior DiagTime Kps4 0.0459 Age Prior DiagTime Treatment
----------------------------------------------------------5 1.0000 Age Prior DiagTime Kps Treatment
Example 37.2. Life Table Estimates for Males with AnginaPectoris
The data in this example come from Lee (1992, p. 91) and represent the survivalrate of males with angina pectoris. Survival time is measured as years from the timeof diagnosis. The data are read as number of events and number of withdrawals ineach one-year time interval for 16 intervals. Three variables are constructed fromthe data:Years (an artificial time variable with values that are the midpoints of thetime intervals),Censored (a censoring indicator variable with value 1 indicatingcensored observations and value 0 indicating event observations), andFreq (thefrequency variable). Two observations are created for each interval, one representingthe event observations and the other representing the censored observations.
SAS OnlineDoc: Version 8
1846 � Chapter 37. The LIFETEST Procedure
title ’Survival of Males with Angina Pectoris’;data males;
keep Freq Years Censored;retain Years -.5;input fail withdraw @@;Years + 1;Censored=0;Freq=fail;output;Censored=1;Freq=withdraw;output;datalines;
PROC LIFETEST is invoked to compute the various life table survival estimates, themedian residual time, and their standard errors. The life table method of computingestimates is requested by specifying METHOD=LT. The intervals are specified by theINTERVAL= option. Graphs of the life table estimate, log of the estimate, negativelog-log of the estimate, estimated density function, and estimated hazard function arerequested by the PLOTS= option. No tests for homogeneity are carried out becausethe data are not stratified.
symbol1 c=blue;proc lifetest data=males method=lt intervals=(0 to 15 by 1)
Example 37.2. Life Table Estimates for Males with Angina Pectoris � 1847
Output 37.2.1. Life Table Survival Estimates
Survival of Males with Angina Pectoris
The LIFETEST Procedure
Life Table Survival Estimates
ConditionalEffective Conditional Probability Survival Median
Interval Number Number Sample Probability Standard Standard Residual[Lower, Upper) Failed Censored Size of Failure Error Survival Failure Error Lifetime
Results of the life table estimation are shown in Output 37.2.1. The five-year survivalrate is 0.5193 with a standard error of 0.0103. The estimated median residual lifetime,which is 5.33 years initially, has reached a maximum of 6.34 years at the beginningof the second year and decreases gradually to a value lower than the initial 5.33 yearsat the beginning of the seventh year.
Output 37.2.2. Summary of Censored and Event Observations
The LIFETEST Procedure
Summary of the Number of Censored and Uncensored Values
PercentTotal Failed Censored Censored
2418 1625 793 32.80
NOTE: There were 2 observations with missing values, negative time values or frequency values less than 1.
SAS OnlineDoc: Version 8
1848 � Chapter 37. The LIFETEST Procedure
Output 37.2.2 shows the number of event and censored observations. The percentageof the patients that have withdrawn from the study is 32.8%.
Output 37.2.3. Life Table Survivor Function Estimate
SAS OnlineDoc: Version 8
Example 37.2. Life Table Estimates for Males with Angina Pectoris � 1849
Output 37.2.4. Log of Survivor Function Estimate
Output 37.2.5. Log of Negative Log of Survivor Function Estimate
SAS OnlineDoc: Version 8
1850 � Chapter 37. The LIFETEST Procedure
Output 37.2.6. Hazard Function Estimate
Output 37.2.7. Density Function Estimate
Output 37.2.3 displays the graph of the life table survivor function estimates versusyears after diagnosis. The median survival time, read from the survivor functioncurve, is 5.33 years, and the 25th and 75th percentiles are 1.04 and 11.13 years,respectively.
SAS OnlineDoc: Version 8
References � 1851
As discussed in Lee (1992), the graph of the estimated hazard function (Out-put 37.2.6) shows that the death rate is highest in the first year of diagnosis. Fromthe end of the first year to the end of the tenth year, the death rate remains relativelyconstant, fluctuating between 0.09 and 0.12. The death rate is generally higher afterthe tenth year. This could indicate that a patient who has survived the first year has abetter chance than a patient who has just been diagnosed. The profile of the medianresidual lifetimes also supports this interpretation.
An exponential model may be appropriate for the survival of these male patients withangina pectoris since the curve of the log of the survivor function estimate versusyears of diagnosis (Output 37.2.4) approximates a straight line through the origin.Visually, the density estimate (Output 37.2.7) resembles that of an exponential distri-bution.
References
Brookmeyer, R. and Crowley, J. (1982), "A Confidence Interval for the Median Sur-vival Time," Biometrics, 38, 29–41.
Collett, D. (1994),Modeling Survival Data In Medical Research,London: Chapmanand Hall.
Cox, D.R. and Oakes, D. (1984),Analysis of Survival Data, London: Chapman andHall.
Elandt-Johnson, R.C. and Johnson, N.L. (1980),Survival Models and Data Analysis,New York: John Wiley & Sons.
Kalbfleisch, J.D. and Prentice, R.L. (1980),The Statistical Analysis of Failure TimeData, New York: John Wiley & Sons.
Lawless, J.E. (1982),Statistical Models and Methods for Lifetime Data, New York:John Wiley & Sons.
Lee, E.T. (1992),Statistical Methods for Survival Data Analysis, Second Edition,New York: John Wiley & Sons.
SAS OnlineDoc: Version 8
The correct bibliographic citation for this manual is as follows: SAS Institute Inc.,SAS/STAT ® User’s Guide, Version 8, Cary, NC: SAS Institute Inc., 1999.