Top Banner

Click here to load reader

Hosmer on Survival With Stata

Oct 20, 2015



survival analysis

  • The Stata Journal

    EditorH. Joseph NewtonDepartment of StatisticsTexas A & M UniversityCollege Station, Texas 77843979-845-3142979-845-3144 [email protected]

    Executive EditorNicholas J. CoxDepartment of GeographyUniversity of DurhamSouth RoadDurham City DH1 3LEUnited [email protected]

    Associate Editors

    Christopher BaumBoston College

    Rino BelloccoKarolinska Institutet

    David ClaytonCambridge Inst. for Medical Research

    Charles FranklinUniversity of Wisconsin, Madison

    Joanne M. GarrettUniversity of North Carolina

    Allan GregoryQueens University

    James HardinTexas A&M University

    Stephen JenkinsUniversity of Essex

    Jens LauritsenOdense University Hospital

    Stanley LemeshowOhio State University

    J. Scott LongIndiana University

    Thomas LumleyUniversity of Washington, Seattle

    Marcello PaganoHarvard School of Public Health

    Sophia Rabe-HeskethInst. of Psychiatry, Kings College London

    J. Patrick RoystonMRC Clinical Trials Unit, London

    Philip RyanUniversity of Adelaide

    Jeroen WeesieUtrecht University

    Jerey WooldridgeMichigan State University

    Copyright Statement: The Stata Journal and the contents of the supporting files (programs, datasets, and

    help files) are copyright c by Stata Corporation. The contents of the supporting files (programs, datasets,and help files) may be copied or reproduced by any means whatsoever, in whole or in part, as long as any

    copy or reproduction includes attribution to both (1) the author and (2) the Stata Journal.

    The articles appearing in the Stata Journal may be copied or reproduced as printed copies, in whole or in part,

    as long as any copy or reproduction includes attribution to both (1) the author and (2) the Stata Journal.

    Written permission must be obtained from Stata Corporation if you wish to make electronic copies of the

    insertions. This precludes placing electronic copies of the Stata Journal, in whole or in part, on publically

    accessible web sites, fileservers, or other locations where the copy may be accessed by anyone other than the


    Users of any of the software, ideas, data, or other materials published in the Stata Journal or the supporting

    files understand that such use is made without warranty of any kind, by either the Stata Journal, the author,

    or Stata Corporation. In particular, there is no warranty of fitness of purpose or merchantability, nor for

    special, incidental, or consequential damages such as loss of profits. The purpose of the Stata Journal is to

    promote free communication among Stata users.

    The Stata Technical Journal, electronic version (ISSN 1536-8734) is a publication of Stata Press, and Stata is

    a registered trademark of Stata Corporation.

  • The Stata Journal (2002)2, Number 4, pp. 331350

    Using Aalens linear hazards model to

    investigate time-varying eects in the

    proportional hazards regression model

    David W. HosmerDepartment of Biostatistics and EpidemiologySchool of Public Health and Health Sciences

    University of Massachusetts715 North Pleasant Street

    Amherst, MA 01003-9304 USA413-545-4532

    [email protected]

    Patrick RoystonCancer Division

    MRC Clinical Trials Unit222 Euston RoadLondon NW1 2DA


    [email protected]

    Abstract. In this paper, we describe a new Stata command, stlh, which esti-mates and tests for the significance of the time-varying regression coecients inAalens linear hazards model; see Aalen (1989). We see two potential uses for thiscommand. One may use it as an alternative to a proportional hazards or othernonlinear hazards regression model analysis to describe the eects of covariates onsurvival time. A second application is to use the command to supplement a propor-tional hazards regression model analysis to assist in detecting and then describingthe nature of time-varying eects of covariates through plots of the estimated cu-mulative regression coecients, with confidence bands, from Aalens model. Weillustrate the use of the command to perform this supplementary analysis withdata from a study of residential treatment programs of dierent durations that aredesigned to prevent return to drug use.

    Keywords: st0024, survival analysis, survival-time regression models, time-to-eventanalysis

    1 Introduction

    The Cox proportional hazards model is the most frequently used regression model for theanalysis of censored survival-time data, particularly within health sciences disciplines.Stata, in its suite of st-survival time programs, has excellent capabilities for fittingthe model, as well as options to obtain diagnostic statistics to assess model fit andassumptions. In particular, the vital proportional hazards assumption can be testedusing stphtest and can be examined graphically using its covariate specific plot option.The problem with the plot is that it is based on the scaled Schoenfeld residuals that aretime-point specific and are themselves quite noisy. Even with smoothing, departuresfrom proportionality may be quite hard to determine. It is also not clear how powerfulthe statistical test in stphtest is to detect modest, but, from a subject matter pointof view, important departures from proportional hazards. In many applied settings,it may be reasonable to suspect that some covariates may have eects on the hazard

    c 2002 Stata Corporation st0024

  • 332 Using Aalens linear hazards model

    function that are relatively constant eect initially and then fade or end. The converseis also a distinct possibility. The standard procedures and tests have a dicult timediagnosing these situations. We have found plots of the estimated cumulative regressioncoecients from a fit of the Aalen linear survival-time model to be a useful adjunct tostandard proportional hazards model analyses. The purpose of this paper is to makeavailable a Stata st-class command called stlh and to illustrate its use.

    2 The Aalen linear hazards model

    Aalen (1980) proposed a general linear survival-time model, an important feature ofwhich is that its regression coecients are allowed to vary over time. He discussesissues of estimation, testing, and assessment of model fit in Aalen (1989 and 1993).

    2.1 The model

    The hazard function at time t for a model containing p+1 covariates, denoted in vectorform, x = (1, x1, x2,K, xp), is

    h(t,x,(t)) = 0(t) + 1(t)x1 + 2(t)x2 +K+ p(t)xp (1)

    The coecients in this model provide the change in hazard at time t, from thebaseline hazard function, 0(t) , for a one-unit change in the respective covariate, holdingall other covariates constant. Note that the model allows the eect of the covariate tochange continuously over time. The cumulative hazard function obtained by integratingthe hazard function in (1) is

    H(t,x,B(t)) =


    h(u,x,, (u))du





    k(u)du (2)




    where x0 = 1 and Bk(t) is called the cumulative regression coecient for the kthcovariate. It follows from (2) that the baseline cumulative hazard function is B0(t).The model is discussed in some detail in Hosmer and Lemeshow (1999), and the textalso includes a review of additional relevant literature. In this paper, the emphasis isplaced on using plots of the estimated cumulative regression coecients to check forpossible time-varying covariate eects.

  • D. W. Hosmer and P. Royston 333

    2.2 Estimation

    Assume that we have n independent observations of time, a right-censoring indicatorvariable, assumed to be independent of time conditional on the covariates, and p fixedcovariates all denoted by the usual triplet for the ith subject as (ti, ci,xi), with ci = 0for a censored observation and 1 for an event. Aalens 1989 estimator of the cumulativeregression coecients is a least-squares-like estimator. Denote the data matrix for thesubjects at risk at time tj by an n by p+ 1 matrix, Xj , where the ith row contains thedata for the ith subject, xi, if the ith subject is in the risk set at time tj ; otherwise,the ith row is all 0s. Denote by yj a n by 1 vector, where the jth element is 1 if thejth subjects observed time, tj , is a survival time (i.e., cj = 1); otherwise, all the valuesin the vector are 0. If we consider, in an informal way, the following as an estimator ofthe vector of the regression coecient at time tj ,

    b(tj) = (XjXj)

    1Xjyj (3)

    then Aalens (1989) estimator of the vector of cumulative regression coecients is

    B(t) =tjt

    b(tj) (4)

    Note that the value of the estimator changes only at observed survival times and is con-stant between observed survival times. Huer and McKeague (1991) discuss weightedversions of the estimator in (3). The weighted estimator is much more complicatedto implement, and it is not clear if it provides better diagnostic power to detect time-varying covariate eects. Thus, it is not used in stlh. Also note that the incrementin the estimator is computed only when the matrix (XjXj) can be inverted; i.e., it isnonsingular. In particular, when there are fewer than p + 1 subjects in the risk set,the matrix is singular. Other data configurations can also yield a singular matrix. Forexample, if the model contains a single dichotomous covariate and all subjects who re-main at risk have the same value for the covariate, the matrix will be singular. Thestlh program checks for this, and estimation stops when (XjXj) turns singular.

    If we use as an estimator of the variance of b(ti), the expression



    ]= (XjXj)


    1 (5)

    then Aalens (1989) estimator of the covariance matrix of the estimated cumulativeregression coecients at time t m