Oct 20, 2015
The Stata Journal
EditorH. Joseph NewtonDepartment of StatisticsTexas A & M UniversityCollege Station, Texas 77843979-845-3142979-845-3144 [email protected]
Executive EditorNicholas J. CoxDepartment of GeographyUniversity of DurhamSouth RoadDurham City DH1 3LEUnited [email protected]
Christopher BaumBoston College
Rino BelloccoKarolinska Institutet
David ClaytonCambridge Inst. for Medical Research
Charles FranklinUniversity of Wisconsin, Madison
Joanne M. GarrettUniversity of North Carolina
Allan GregoryQueens University
James HardinTexas A&M University
Stephen JenkinsUniversity of Essex
Jens LauritsenOdense University Hospital
Stanley LemeshowOhio State University
J. Scott LongIndiana University
Thomas LumleyUniversity of Washington, Seattle
Marcello PaganoHarvard School of Public Health
Sophia Rabe-HeskethInst. of Psychiatry, Kings College London
J. Patrick RoystonMRC Clinical Trials Unit, London
Philip RyanUniversity of Adelaide
Jeroen WeesieUtrecht University
Jerey WooldridgeMichigan State University
Copyright Statement: The Stata Journal and the contents of the supporting files (programs, datasets, and
help files) are copyright c by Stata Corporation. The contents of the supporting files (programs, datasets,and help files) may be copied or reproduced by any means whatsoever, in whole or in part, as long as any
copy or reproduction includes attribution to both (1) the author and (2) the Stata Journal.
The articles appearing in the Stata Journal may be copied or reproduced as printed copies, in whole or in part,
as long as any copy or reproduction includes attribution to both (1) the author and (2) the Stata Journal.
Written permission must be obtained from Stata Corporation if you wish to make electronic copies of the
insertions. This precludes placing electronic copies of the Stata Journal, in whole or in part, on publically
accessible web sites, fileservers, or other locations where the copy may be accessed by anyone other than the
Users of any of the software, ideas, data, or other materials published in the Stata Journal or the supporting
files understand that such use is made without warranty of any kind, by either the Stata Journal, the author,
or Stata Corporation. In particular, there is no warranty of fitness of purpose or merchantability, nor for
special, incidental, or consequential damages such as loss of profits. The purpose of the Stata Journal is to
promote free communication among Stata users.
The Stata Technical Journal, electronic version (ISSN 1536-8734) is a publication of Stata Press, and Stata is
a registered trademark of Stata Corporation.
The Stata Journal (2002)2, Number 4, pp. 331350
Using Aalens linear hazards model to
investigate time-varying eects in the
proportional hazards regression model
David W. HosmerDepartment of Biostatistics and EpidemiologySchool of Public Health and Health Sciences
University of Massachusetts715 North Pleasant Street
Amherst, MA 01003-9304 USA413-545-4532
Patrick RoystonCancer Division
MRC Clinical Trials Unit222 Euston RoadLondon NW1 2DA
Abstract. In this paper, we describe a new Stata command, stlh, which esti-mates and tests for the significance of the time-varying regression coecients inAalens linear hazards model; see Aalen (1989). We see two potential uses for thiscommand. One may use it as an alternative to a proportional hazards or othernonlinear hazards regression model analysis to describe the eects of covariates onsurvival time. A second application is to use the command to supplement a propor-tional hazards regression model analysis to assist in detecting and then describingthe nature of time-varying eects of covariates through plots of the estimated cu-mulative regression coecients, with confidence bands, from Aalens model. Weillustrate the use of the command to perform this supplementary analysis withdata from a study of residential treatment programs of dierent durations that aredesigned to prevent return to drug use.
Keywords: st0024, survival analysis, survival-time regression models, time-to-eventanalysis
The Cox proportional hazards model is the most frequently used regression model for theanalysis of censored survival-time data, particularly within health sciences disciplines.Stata, in its suite of st-survival time programs, has excellent capabilities for fittingthe model, as well as options to obtain diagnostic statistics to assess model fit andassumptions. In particular, the vital proportional hazards assumption can be testedusing stphtest and can be examined graphically using its covariate specific plot option.The problem with the plot is that it is based on the scaled Schoenfeld residuals that aretime-point specific and are themselves quite noisy. Even with smoothing, departuresfrom proportionality may be quite hard to determine. It is also not clear how powerfulthe statistical test in stphtest is to detect modest, but, from a subject matter pointof view, important departures from proportional hazards. In many applied settings,it may be reasonable to suspect that some covariates may have eects on the hazard
c 2002 Stata Corporation st0024
332 Using Aalens linear hazards model
function that are relatively constant eect initially and then fade or end. The converseis also a distinct possibility. The standard procedures and tests have a dicult timediagnosing these situations. We have found plots of the estimated cumulative regressioncoecients from a fit of the Aalen linear survival-time model to be a useful adjunct tostandard proportional hazards model analyses. The purpose of this paper is to makeavailable a Stata st-class command called stlh and to illustrate its use.
2 The Aalen linear hazards model
Aalen (1980) proposed a general linear survival-time model, an important feature ofwhich is that its regression coecients are allowed to vary over time. He discussesissues of estimation, testing, and assessment of model fit in Aalen (1989 and 1993).
2.1 The model
The hazard function at time t for a model containing p+1 covariates, denoted in vectorform, x = (1, x1, x2,K, xp), is
h(t,x,(t)) = 0(t) + 1(t)x1 + 2(t)x2 +K+ p(t)xp (1)
The coecients in this model provide the change in hazard at time t, from thebaseline hazard function, 0(t) , for a one-unit change in the respective covariate, holdingall other covariates constant. Note that the model allows the eect of the covariate tochange continuously over time. The cumulative hazard function obtained by integratingthe hazard function in (1) is
where x0 = 1 and Bk(t) is called the cumulative regression coecient for the kthcovariate. It follows from (2) that the baseline cumulative hazard function is B0(t).The model is discussed in some detail in Hosmer and Lemeshow (1999), and the textalso includes a review of additional relevant literature. In this paper, the emphasis isplaced on using plots of the estimated cumulative regression coecients to check forpossible time-varying covariate eects.
D. W. Hosmer and P. Royston 333
Assume that we have n independent observations of time, a right-censoring indicatorvariable, assumed to be independent of time conditional on the covariates, and p fixedcovariates all denoted by the usual triplet for the ith subject as (ti, ci,xi), with ci = 0for a censored observation and 1 for an event. Aalens 1989 estimator of the cumulativeregression coecients is a least-squares-like estimator. Denote the data matrix for thesubjects at risk at time tj by an n by p+ 1 matrix, Xj , where the ith row contains thedata for the ith subject, xi, if the ith subject is in the risk set at time tj ; otherwise,the ith row is all 0s. Denote by yj a n by 1 vector, where the jth element is 1 if thejth subjects observed time, tj , is a survival time (i.e., cj = 1); otherwise, all the valuesin the vector are 0. If we consider, in an informal way, the following as an estimator ofthe vector of the regression coecient at time tj ,
b(tj) = (XjXj)
then Aalens (1989) estimator of the vector of cumulative regression coecients is
Note that the value of the estimator changes only at observed survival times and is con-stant between observed survival times. Huer and McKeague (1991) discuss weightedversions of the estimator in (3). The weighted estimator is much more complicatedto implement, and it is not clear if it provides better diagnostic power to detect time-varying covariate eects. Thus, it is not used in stlh. Also note that the incrementin the estimator is computed only when the matrix (XjXj) can be inverted; i.e., it isnonsingular. In particular, when there are fewer than p + 1 subjects in the risk set,the matrix is singular. Other data configurations can also yield a singular matrix. Forexample, if the model contains a single dichotomous covariate and all subjects who re-main at risk have the same value for the covariate, the matrix will be singular. Thestlh program checks for this, and estimation stops when (XjXj) turns singular.
If we use as an estimator of the variance of b(ti), the expression
then Aalens (1989) estimator of the covariance matrix of the estimated cumulativeregression coecients at time t m