Top Banner
Genetic Epidemiology 11:155-170 (1994) Survival Models for Developmental Genetic Data: Age of Onset of Puberty and Antisocial Behavior in Twins Andrew Pickles, Robert Crouchley, Emily Simonoff, Lindon Eaves, Joanne Meyer, Michael Rutter, John Hewitt, and Judy Silberg MRC Child Psychiatry Unit and Department of Child and Adolescent Psychiatry (A.P, E.S., M. R.) and Department of Biostatistics and Computing (AX), Institute of Psychiatry and Department of Statistics and Mathematical Sciences, London School of Economics (R. C.), London, England; Department of Human Genetics, Medical College of Virginia (L. E., J. M., J. S.), Richmond; lnstitute for Behavioral Genetics, University of Colorado, Boulder (J. H.) The use of survival analysis for developmental genetic data is discussed. The main requirements for models based on the decomposition of frailty distributions into shared and unshared components are outlined for the simple case of twins. Extending the earlier work of Clayton, Oakes, and Hougaard, among others, three forms of hazard model are presented, all of which can be applied to pedigree data with flexible baseline hazards without the use of numerical integration. The first two models use an additive decomposition of frailty, with either gamma or positive stable law distributed (PSL) components. The third model previously described by Hougaard involves a multiplicative PSL decomposition. The models are applied to data on the onset of puberty in male twins and illustrate the importance of correct specification of the baseline hazard for correct inference about genetic effects. The difficulty of assessing model specification using information only on the margins is also noted. Overall, the new model with additive PSL components appeared to fit these data best. A second application illustrates the use of a time-varying covariate in examining the impact of puberty on the onset of conduct disorder SymptOmOtOlOgy. 0 1994 Wiley-Liss, Inc. Key words: pedigree data, frailty, conduct disorder Received for publication July 12, 1993; accepted December 2, 1993. Address reprint requests to Andrew Pickles, MRC Child Psychiatry Unit, Institute of Psychiatry, London SE5 8AF, England. 0 1994 Wiley-Liss, Inc.
16

Survival models for developmental genetic data: Age of onset of puberty and antisocial behavior in twins

Apr 23, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Survival models for developmental genetic data: Age of onset of puberty and antisocial behavior in twins

Genetic Epidemiology 11:155-170 (1994)

Survival Models for Developmental Genetic Data: Age of Onset of Puberty and Antisocial Behavior in Twins

Andrew Pickles, Robert Crouchley, Emily Simonoff, Lindon Eaves, Joanne Meyer, Michael Rutter, John Hewitt, and Judy Silberg

MRC Child Psychiatry Unit and Department of Child and Adolescent Psychiatry (A.P, E.S., M. R.) and Department of Biostatistics and Computing (AX), Institute of Psychiatry and Department of Statistics and Mathematical Sciences, London School of Economics (R. C.), London, England; Department of Human Genetics, Medical College of Virginia (L. E., J. M., J. S.), Richmond; lnstitute for Behavioral Genetics, University of Colorado, Boulder (J. H.)

The use of survival analysis for developmental genetic data is discussed. The main requirements for models based on the decomposition of frailty distributions into shared and unshared components are outlined for the simple case of twins. Extending the earlier work of Clayton, Oakes, and Hougaard, among others, three forms of hazard model are presented, all of which can be applied to pedigree data with flexible baseline hazards without the use of numerical integration. The first two models use an additive decomposition of frailty, with either gamma or positive stable law distributed (PSL) components. The third model previously described by Hougaard involves a multiplicative PSL decomposition. The models are applied to data on the onset of puberty in male twins and illustrate the importance of correct specification of the baseline hazard for correct inference about genetic effects. The difficulty of assessing model specification using information only on the margins is also noted. Overall, the new model with additive PSL components appeared to fit these data best. A second application illustrates the use of a time-varying covariate in examining the impact of puberty on the onset of conduct disorder SymptOmOtOlOgy. 0 1994 Wiley-Liss, Inc.

Key words: pedigree data, frailty, conduct disorder

Received for publication July 12, 1993; accepted December 2, 1993.

Address reprint requests to Andrew Pickles, MRC Child Psychiatry Unit, Institute of Psychiatry, London SE5 8AF, England.

0 1994 Wiley-Liss, Inc.

Page 2: Survival models for developmental genetic data: Age of onset of puberty and antisocial behavior in twins

156 Pickles et al.

INTRODUCTION

In comparison to the extensive genetic analysis of quantitative traits, the age of attainment of normal developmental milestones has received little attention. This is also true of the age of onset of behavioral disorders when compared to the exten- sive analysis of the presence or absence of behavioral disorder, By contrast, in other areas of biomedical research, notably cancer, age of onset or the survival time are commonly analyzed outcome variables. For several reasons survival analysis may have considerable unexploited potential for developmental behavior genetics.

First, in the case of many developmental milestones, what varies between indi- viduals is not whether they attain the milestone or not, for almost all do, but when they do. Analyses that explore the possible changing impact of both environmental factors and genes on the age of attainment thus offer scope for examining the mech- anisms underlying the developmental process.

Second, the ubiquitous use of survival analysis in clinical treatment trials of fatal diseases is evidence of the fact that it provides a natural framework for dealing with censored observations. Censored observations refer to survival times that were incomplete in that the individuals were still alive when last recorded. Such censored observations arise in the case of a developmental milestone in samples where the younger subjects have yet to attain it, and in epidemiological studies of disorder for the majority of subjects who never show the disorder even though all may be at risk. Survival analysis efficiently combines the information on occurrence with that on survival through periods of risk prior to occurrence.

Third, survival analysis allows the impact of measured risk factors to be esti- mated usually by means of their inclusion as regressors in some relative risk func- tion. In clinical trials such risk factors are often assumed to be fixed over time and to have a uniform impact at all ages. Such assumptions would rarely seem plausible for behavioral research which emphasizes developmental change, varying exposure, and possibly varying sensitivity to risk factors. However, both time variation in risk fac- tors and age variation in their effects can both be accommodated within the survival analysis framework [e.g., Kalbfleisch and Prentice, 19801.

Constructing computationally tractable survival models that allow for correla- tions between relatives in the age of onset is straightforward, but to do this in a way that is consistent with the expectations of a multifactorial genetic and environmen- tal process is not. The approach of Wickmaratne et al. [1986], who included the age of onset of the proband as an additional regressor for the risk among relatives, does not formally integrate this information with either that available on the risks, expo- sure, and onset of all relatives or with that on any possible control families. Such information can be integrated into models more formally based on distributions of frailty, but many models [e.g., Self and Prentice, 1986; Mack et al., 1990; Anderson et al., 1992; Hougaard et al., 19921 have tackled this problem with a single shared component of frailty. Though such models allow for varying levels of correlation in ages of onset of related individuals by increasing or decreasing the variance of this shared component, this variation also leads to varying distributions for the univariate or marginal outcomes of relatives when considered as individuals. In the context of twin analysis, in allowing the age of onset of monozygotic (MZ) twins to be more correlated than those of dizygotic (DZ) twins, the model requires MZ twins to have

Page 3: Survival models for developmental genetic data: Age of onset of puberty and antisocial behavior in twins

Survival Models for Age-of-Onset in Twins 157

a distribution of frailty, and thus of age of onset, with greater variance than that of DZs. Such differences in variance are typically not appropriate and at odds with the standard genetic theory.

This paper describes and illustrates the use of a set of survival models suitable for genetic data that, while meeting the marginal distributional constraints of the multifactorial model, are computationally fast even in the presence of censoring and time-varying covariates. We illustrate their application using data on male twins from the Virginia Twin Study of Adolescent Behavioral Development (Hewitt et al., in preparation), first for a developmental milestone, a measure of the age of puberty, and then for a behavioral outcome, the onset of symptomatology of childhood conduct disorder.

MODELS FOR AGE-OF-ONSET OR SURVIVAL DATA Basic Models

There are two main approaches to the analysis of related events [Aalen and Hus- bye, 199 11. In the so-called accelerated failure-time (AFT) approach the distribution of age-of-onset a, or some simple transformation of it, is modeled directly. With p time constant explanatory variables xl , x2, . . . , x p measuring risk factors with effects estimated by coefficients a1, a2, . . . , ap, such models can be estimated as (censored) regressions of the form

Different assumptions about the distribution of the error term E correspond to dif- ferent age-of-onset distributions. In the case of data from twins and other kinships and in the absence of censoring, the regression formulation of the AFT model allows analysis using the standard variance components decomposition of the twin method, for example using LISREL [Joreskog and Sorbum, 19891. In this approach the error E of equation (1) is decomposed into shared and unshared effects owing to genes and environment and analysis proceeds as for any other quantitative trait [e.g., Neale and Cardon, 19921. The use of standard software typically requires all the components to be jointly multivariate normal. This effectively restricts the age-of-onset distribution (adjusted for the effects of covariates) to be log-normal.

The much greater difficulty posed by censored data for ths AFT approach en- couraged a view of the onset process as one involving two components. The first was concerned with the liability of individuals ever to present with the disorder at all, while the second was concerned with the age-of-onset process only for those who were liable and thus also provided uncensored time of onset. On the assumption that the components were independent, each could be analyzed separately using standard software. For some disorders this might be a plausible assumption [Kendler et al., 19871. Models that allow liability and age of onset to be correlated [Neale et al., 19891 are computationally more difficult.

The second approach to the analysis of related events is based not on the age- of-onset distribution itself but on the estimation of a time-varying risk of onset, A([), for those subjects who have not yet experienced onset. It allows the probability of survival up to age a to be expressed as a simple function of the cumulative exposure

Page 4: Survival models for developmental genetic data: Age of onset of puberty and antisocial behavior in twins

158 Pickles et al.

to risk of onset up to that time, commonly referred to as the integrated hazard or integrated intensity function R(a), i.e.,

$(a) = exp{-A(a)} = exp - h(t)dt. La The intensity function h(t) is usually decomposed into a “baseline” hazard, ho(t), that specifies the form of variation in the rate with age, and an expression associ- ated with the effects of measured risk factors and covariates. In proportional hazards (PH) models effects of explanatory covariates act on an exponential scale, implying that their effects combine multiplicatively but also ensuring that all predicted hazards are positive. Models with additive effects can also be estimated but care is required that predicted hazards remain positive. The values of covariates may vary over time and so are represented by time varying functions, here denoted X I ( t ) , x2(t) , . . . , xp(t) for p such variables. With covariate effects estimated by regression coefficients 61, 62,. . . , P2, a PH model thus takes the form

where q(t) is referred to as the linear predictor. In many cases the data available for the onset of behavior cannot provide an

exact age of onset but only an interval within which it occurred. The probability of onset within the interval a to a + m is then obtained as the difference between the probability of surviving until time a, S(a) , and the probability of surviving until time S(a + m), i.e.,

Pr(a < T 9 a + m) = S(a) - S(a + m) = exp[-h(a)] - exp[-h(a + m)].

Since the data of the examples in this paper are of this form, and both the equations and the computation are simpler, we have used this form throughout.

Frailty Models Based on the Hougaard Family of Distributions In survival models, unobserved or unmeasured explanatory variables, some of

which may be genetic, are often referred to as frailties. Denoted by ~ ( t ) , the frail- ties take values restricted to the positive line and are also usually assumed to act multiplicatively on the hazard or intensity,

Models based on this factorization and that assume ~ ( t ) constant with t, have been the subject of considerable research [Clayton, 1978; Pickles, 1983; Clayton and Cuzick, 1985; Self and Prentice, 1986; Hougaard, 1986a,b; Oakes, 19821. Although, a vari- ety of simple closed-form likelihoods can be obtained, these did not include those involving normally distributed frailty terms. Various alternative distributions could be considered for the joint distribution of T [Anderson et al., 19921, but because of their convenient decompositional properties we restrict ourselves to a restricted form

Page 5: Survival models for developmental genetic data: Age of onset of puberty and antisocial behavior in twins

Survival Models for Age-of-Onset in Twins 159

of Hougaard [1986a] family of distributions denoted by P(y, $‘-Y, $) and having the density function

In particular, we will consider two restricted forms of this density, P(0, $, $) that cor- responds to a gamma density with mean 1 and variance 1 /$ ($ > 0), and P(y, y, 0) that for 0 < y < 1 corresponds to a positive stable law density.

Models for Twin Data: Additive Decompositions of Gamma Densities For each twin we observe aij, their age of onset or, where onset has not (yet)

occurred, their current age. The subscript (i) distinguishes twin pairs and 6) distin- guishes between the twins of a twin pair. A censoring indicator is used to distinguish between an uncensored time where onset has been observed, di, = 1, and a censored observation where it has not, dij = 0.

The survival probability conditional on the value of the frailty Ti,, Sij(aij I Ti,), is assumed to be of the form

In subsequent equations the twin pair subscript i will often be dropped for simplicity. For a model involving no frailty the probability of observed ages (if censored) or

ages of onset (if uncensored) a1 and a2 is given by an expression involving four terms Too (both censored), Tlo (twin 1 censored), Tol (twin 2 censored), and T I I (neither censored).

To examine the consequences of frailty effects potentially shared by twins, only one of the four terms in the above expression needs be illustrated. Inclusion of frailty into the first term gives

We begin by considering the case in which “specific environment” effects are re- stricted to those arising from differences in values of included covariates or from the intrinsically stochastic nature of the risk process. Then, 71 and ~2 represent com- mon environment and genes. In the case of MZ twins these are entirely shared and will thus be identical random variables. If the linear predictors q, contain the usual

Page 6: Survival models for developmental genetic data: Age of onset of puberty and antisocial behavior in twins

160 Pickles et al.

regression type intercept then without loss of generality E(r) can be assumed 1. In- tegrating equation (3) with T distributed as P(0, +, +), a gamma distribution, gives,

This is the distribution function of the bivariate Oakes model [Oakes, 19821. This model has to be generalized in order to obtain an expression for DZ twins.

The frailty effect T can be decomposed into a component r,, Concerned with the common environment and the half of the genes that are shared by DZ twins, and independent components 76 and r,, concerned with the unshared genes such that

and

For this member of the Hougaard family, if each component is distributed as P(O,pj+,+) and E[T, + 761 = pa + p b = E[r, + 7-J = p, + pc = 1 then var(r1) = var(r2) = + and corr(3-172) = p,. Using the constraints on the sums of the p's to write just p for p, and (1 - p) for both p b and p,, we obtain

Setting p = 1 gives the expression for MZ twins shown in equation (4). For both single MZ and single DZ twins the marginal probabilities are the same and are given by

In this previous model the assumptions about specific environment effects are equiv- alent to assuming that, having accounted for differences in covariates, information on an MZ co-twin is essentially identical with (perhaps hypothetical) repeat measure data on that same twin. There are, however, invariably some covariates, for example differential birth complications, that we will fail to include, and these may contribute to stable systematic differences between one twin and the other. The potential equiv- alence of co-twin and repeat measure data cannot, therefore, be assumed. However, the model can be easily extended by estimating a p parameter for MZ twins in ad- dition to that for DZ twins. This allows their frailty terms to be less than perfectly correlated. For such a model, quantities that correspond to the standard twin vari- ance decomposition can be calculated. Thus the proportion of the frailty variance corresponding to unique or specific environment is given by (1 - p ~ z ) , the propor- tion of genetic variance by 2 ( p M Z - p ~ z ) , and the proportion of shared or common environment variance by 2 p ~ z - ~ M Z .

Page 7: Survival models for developmental genetic data: Age of onset of puberty and antisocial behavior in twins

Survival Models for Age-of-Onset in Twins 161

Models for Twin Data: Additive and Multiplicative Decompositions of Positive Stable Law Densities

distribution P(y, y, 0). This yields for the MZs A similar additive decomposition can be used to derive a model using the PSL

and for DZs

As was the case for the additive gamma model, this model can be extended to allow a specific environment component by estimating a parameter for MZ twins as well as that for DZ twins. Also a similar decomposition is possible to yield estimates of the proportions of the overall frailty that are unique environment, genetic, and common environment. However, given that the PSL distribution lacks finite moments, caution is required before interpreting these as entirely analagous to proportions of variance.

Using a positive stable law distribution, Hougaard [ 1986a,b] previously sug- gested a multiplicative decomposition

in which 71 = 7,rh and 72 = r,rc, where 7 b is P($b, $b, o), rc is P($b, $b, O), and rp is P($,, $,, 0). For MZ twins this gives

and for DZ twins

In the case of the multiplicative decomposition, an extended model can be written down algebraically, but it is not identified. There are also difficulties in attempting a decomposition to obtain proportions of frailty that are genetic or environmental for Hougaard’s multiplicative model. However, simple expressions are available for the correlations in the log integrated hazards, these being given by (1 - $u$b) for MZ twins and (1 - $,) for DZ twins.

The univariate marginal distribution for both additive and multiplicative decom- positions of the PSL models reduce to the same distribution with

Pr[a,l = exp{-Aj(aj)llmJib} - dj exp{ -Aj(uj + I)“@’@}

= exp{-Aj(aj)Y} - djexp{-Aj(aj + 1)Y}.

Page 8: Survival models for developmental genetic data: Age of onset of puberty and antisocial behavior in twins

162 Pickles et al.

ILLUSTRATIVE RESULTS Inference About Genetic Effects and Assumptions About the Age-of-Onset Distribution: Age of Onset of Puberty

Several of the standard parametric hazard functions can be derived from formal mathematical processes that could be plausibly hypothesized as appropriate to a wide variety of developmental behavior. For example, the Weibull hazard can be derived from a process involving passage through several intermediate stages before onset is observable while the gamma hazard can be derived from a process where onset re- quires multiple “hits” or “insults” before onset occurs and has been proposed for use in behavior genetics [Meyer and Eaves, 19881. However, also available are methods allowing very flexible baseline hazards [Cox, 1972; Prentice and Gloekler, 19781. We typically have little firm knowledge of the actual biological, social, and psychological processes to make a confident choice. Is the choice of distribution relatively unim- portant for genetic analyses or should we follow the current practice in most routine biomedical survival analyses and choose the most flexible hazard?

To examine this question, we selected data from the Virginia Twin Study of Ado- lescent Behavioral Development (VTSABD) in which an epidemiological sample of twins aged 8-16 years of age were assessed in a variety of areas (Hewitt et al., in preparation). This longitudinal multiple cohort study is following children initially aged between 8 and 16. We analyzed data on the onset of puberty in 254 pairs of male twins, as recorded by the mothers report of voice breaking in the CAPA ma- ternal interview [Angold et al., 19891. These data, taken from the 1st year of the study, are heavily censored, with most boys yet to experience onset. Table I shows the marginal observed data summed over all 508 boys. Following standard practice, censored observations are assumed withdrawn from the “at risk” sample at the be- gining of the interval. Of the 132 MZ pairs, there were 41 pairs in which onset was recorded for both boys and a further nine in which it was recorded for one boy. Of the

TABLE I. The Numbers of Boys at “Risk” Whose Voice Broke and Who Were Censored at Each Age

Age At risk Voice broke Censored

1 508 0 0 2 508 0 0 3 508 0 0 4 508 0 0 5 508 0 0 6 508 0 0 7 508 0 0 8 472 0 36 9 404 0 68

10 353 6 51 I 1 290 10 57 12 229 31 51 13 165 50 33 14 95 52 20 15 35 21 8 16 2 2 12 17 0 0 0

Page 9: Survival models for developmental genetic data: Age of onset of puberty and antisocial behavior in twins

Survival Models for Age-of-Onset in Twins 163

122 DZ pairs, there were 28 with onset for both boys and a further 25 in which it was recorded for one boy. Casual inspection of these data confirm the findings elsewhere [e.g., Fishbein, 19771 of genetic effects. How readily are such effects identified when the data on the year of onset are analyzed using models from the preivous section?

Table I1 gives the log-likelihoods for a series of pairs of models with increasingly general hazard functions. Each model was of the form of equation (2) with gamma distributed random effects and was fitted by maximum likelihood using a fortran 90 program and NAG Mark 15 optimisation routin E04JBF [NAG, 19911. In the first model of each pair, the parameter p was fixed at 1, resulting in models that excluded any possible genetic effects. In the second model of each pair, this parameter was free. The first pair of models to be fitted used a standard fully parametric intensity function, a Weibull function of the form -yo(t) = 6tS-' exp(p0). With this model no sigificant genetic effects could be identified. The models of the second row used a piecewise exponential intensity function, in which there is a free parameter for each distinct interval of time in which an onset occurs. This results in an estimated survivor function that follows a series of steps, requiring the estimation of five parameters to account for the age variation in rate. Using this hazard function, one that should now match that in the data more closely (it would only match exactly in the absence of any random effects), genetic effects are readily identified with the model that restricts the correlation to be due to common environment fitting very significantly worse than the model with genetic effects included.

The fit of the piecewise exponential model can be considered as a benchmark against which the performance of more restrictive parametric models can be com- pared. A model was fitted that assumed a zero rate until age 10, the first observed age of onset, and then assumed the rate varied according to the function exp(6, + 61(t - 10) + &(t - 10)**2), a lagged quadratic function or generalized Gompertz model. This fitted almost as well and gave similar inference as to the presence of genetic effects.

For our purposes, the important point is not that this or that particular model fits well but that correct inference about the genetic effects depended crucially on fitting a model with an appropriate age-of-onset distribution. Figure 1 gives the ob- served marginal survivor function and those obtained from the Weibull, piecewise exponential, and generalized Gompertz models. The Weibull model, in terms of the log-likelihood, clearly fitted the bivariate data worse than the generalized Gompertz model and gave markedly different inference regarding the presence of genetic ef- fects. Yet this poor performance is not reflected in the marginal distribution. This

TABLE 11. The Relative Fit and Significance of Genetic Effects in the Onset of Puberty for Different Assumed Forms of Intensity Function and Gamma Random Effects

Test of Common Common genetic

Intensity environment environment effects

Log-likelihood for model with

function only and genes X V )

Piecewise exponential -323.3 1 -307.71 31.08 Weibull -330.25 -328.71 3.08

Generalized Gompertz -324.07 -309.51 29.12

Page 10: Survival models for developmental genetic data: Age of onset of puberty and antisocial behavior in twins

164 Pickles et al.

Age

Fig. 1. Marginal survivor function: onset of voice breaking.

suggests that checking the fit to the marginal distributions alone is not a sufficient check of the appropriateness of the model. For data on behavioral milestones or dis- orders for which we have little knowledge as to the likely form of intensity function, the use of flexible baseline intensity functions would seem to be essential.

Random Effects Distributions and Decompositions: Age of Onset of Puberty

In Table I11 results are shown for the two simple additive models (those with only a single p parameter for DZ twins) and the multiplicative model. All three were estimated using a piecewise exponential intensity function. The three models thus had gamma distributed effects that were additively decomposed, positive stable law effects that were additively decomposed, and positive stable law effects that were multiplicatively decomposed. Inference about the apparent significance of genetic effects from these three models was very similar, but they did not all fit equally well. The three models are not nested so formal likelihood ratio tests are not possible. However, a direct comparison of the log-likelihoods would, at this point, suggest the

TABLE 111. The Relative Fit and Significance of Genetic Effects in the Onset of Puberty for Different Assumed Distributions of Random Effects and Their Decomposition Using Piecewise ExBonential Intensitv Function

Log-likelihood for model with Test of Random effects Common Common genetic distribution environment environment effects and decomposition only and genes X 2 W Gamma additive -323.31 -307.77 3 1.08 PSL additive -3 1 1.77 -297.66 28.22 PSL multiplicative -3 1 1.77 -298.79 25.96

Page 11: Survival models for developmental genetic data: Age of onset of puberty and antisocial behavior in twins

Survival Models for Age-of-Onset in Twins 165

models with positive stable law random effects fit better than the one with gamma random effects.

Fitting the extended versions of the two models with additive effects, which al- lowed for specific environment effects that distinguished MZ twins from one an- other, gave further significant improvement in fit ( ~ ~ ( 1 ) = 7.80, P = .005 and 4.28, P = .04 for the gamma and PSL models, respectively). Comparing the log- likelihoods for these two models, and that obtained from the unextendable multi- plicative PSL random effects model suggests that our new additive positive stable law structure was the most appropriate.

Table IV gives the parameter estimates for these three models. In both the ad- ditive models the parameter p ~ z estimates the correlation in random effects for MZ twins. In both cases it was estimated as close to 1, suggesting only minor proportions of specific environment effects of 0.09 for the gamma model and 0.04 for the PSL. The substantial differences between the estimates of ~ M Z and p ~ z indicated marked genetic effects. Estimates of the proportion of genetic effects were 0.81 and 0.95, re- spectively. Estimates of the proportion of shared environment effects were both small at 0.10 and 0.01, respectively. The broad picture is thus similar from these two mod- els, but further work is required to understand the implications of the differences.

In the multiplicative PSL model (1 - $2) gives the correlation in the log in- tegrated hazards, estimated as 0.97 for MZ twins and 0.63 for DZ twins. Though direct comparison is difficult these results imply broadly similar inferences about the relative importance of genetic and environmental effects as the additve models. The estimates of the piecewise hazards, 8, to 86, although differing in absolute magnitude from model to model, showed a similar progressively increasing pattern.

Time Varying Risk Factors: Age of Onset of Antisocial Behavior

Antisocial behavior that forms the basis of the psychiatric diagnosis of conduct disorder first shows itself in childhood and has substantial continuity into adulthood either as antisocial personality disorder or in the form of less severe but broader social difficulties [Robins, 1966; Zoccolillo et al., 19921. The criterion used in the same

TABLE IV. Parameter Estimates for Different Random Effects Models of Age of Onset of Pubertv in Male Twins

Parameter

9

80 6, 82 83 84 85

PMZ

POZ

86

Gamma additive

estimates

PSL additive

estimates

PSL multiplicative

estimates

0.048 0.9 13 0.506

-3.794 -2.219

1.818 9.600

25.950 44.485 60.000

y 0.110 0.959 0.482

- 36.926 -25.488 - 13.838 -4.912

3.151 7.834

27.650

-22.106 -15.200 -8.071 -3.018

1.894 4.849

20.000

- lorL 303.79 295.57 298.79

Page 12: Survival models for developmental genetic data: Age of onset of puberty and antisocial behavior in twins

166 Pickles et al.

TABLE V. Numbers of Subjects at Risk, Numbers Who Experienced Onset of Second CD Symptom, and Number Who Were Censored at Each Age

At risk Onset Censored Age

1 508 0 0 2 508 0 0 3 508 3 0 4 505 4 0 5 50 1 6 0 6 495 4 0 7 49 1 1 0 8 454 3 30 9 390 11 61

10 33 1 4 48 11 214 6 53 12 213 10 55 13 153 1 50 14 90 6 56 15 41 2 37 16 3 1 42 1 1 0 0 2

sample of boys for whom puberty was analyzed was the age at which each boy had acquired two symptoms of conduct disorder according to the CAPA interview with each child. Again, there was heavy censoring. Table V gives the observed marginal data summed over all 508 boys. Of the 132 MZ pairs, there were seven in which both boys had shown onset according to this criterion, with a further 25 pairs with onset in only one boy. Of the 122 DZ pairs, there were five in which both boys had shown onset, and a further 25 pairs with onset in only one boy. For these data there was a much wider range of ages of onset than occurred in the puberty data, the earliest being at just 3 years of age. This made the fitting of the piecewise exponential hazard more demanding in terms of the number of parameters required. The generalized Gompertz function, that had proved more than adequate with the puberty data, was thus fitted. In addition to modeling shared and unshared random effects using the additive gamma distributed decomposition, this analysis also illustrates the impact of a time varying explanatory variable, pubertal status. This dummy variable took the value 0 for intervals prior to puberty, and the value 1 for intervals at and after puberty. This allowed a test of whether early physical maturity was a risk factor for the onset of conduct disorder symptomatology. Olweus [1986], from a path analysis of longitudinal correlation data, reported a link between male hormone levels and aggressive response to provocation that would suggest that such an effect might be present.

Figure 2 shows the observed and expected marginal survivor functions from this model. The model clearly fits to the marginal data very well, the only discrepancies occurring in the last intervals where the numbers at risk are very small and thus sampling errors substantial. Table VI shows the parameter estimates obtained from this model. The value of $ far from 1 shows that MZs twins share much of the risk for conduct disorder. The value of p less than I suggests there to be a genetic component, since the effects are shared to a much lesser extent by DZ twins. However, fitting the

Page 13: Survival models for developmental genetic data: Age of onset of puberty and antisocial behavior in twins

Survival Models for Age-of-Onset in Twins 167

f Generalized Gompertz

Age

Fig. 2. Marginal survivor function: onset of second symptom of conduct disorder.

model with p restricted to 1 gave little loss in fit ( ~ ~ ( 1 ) = 0.56, P = S). There is thus little information on the extent to which the shared risks are genetic or not. The positive estimate for the fi coefficient for pubertal status suggested that physical maturity does coincide with increased risk of conduct disorder. However, here again the fitting of a restricted model with the coefficient set to 0 gave a model that fitted little worse ( ~ ~ ( 1 ) = 0.82, P = .4).

DISCUSSION

The essential assumption of the basic twin method is that twins, whether MZ or DZ, are drawn from identical populations of genes and environments. Thus, a randomly drawn MZ twin should share exactly the same univariate or marginal dis- tribution of ages of onset as a randomly drawn DZ twin. It is only in the bivariate distribution of the two ages of onset of a twin pair that MZs might differ from DZs. Models based on the bivariate normal distribution can be easily obtained that meet this criterion, but in addition, such models have achieved a special status through their derivation from the multifactorial genetic model for simple additive effects.

TABLE VI. Parameter Estimates for Additive Gamma Random Effects Model of Age of Onset of Conduct Disorder SvmDtomatoloev in Twins (Parameters Defined in Text)

Parameter Estimate

+ 2.398 P 0.629

P (puberty) 0.391 80 -4.952 81 0.119 s, 0.010

Page 14: Survival models for developmental genetic data: Age of onset of puberty and antisocial behavior in twins

168 Pickles et al.

In practice, this status may be less deserved for non-linear models than is usually recognized.

First, it assumes that we know, or can easily determine, the scale upon which the process is additive. This can be difficult to do, particularly with heavily censored data typical of most studies of the onset of behavioral disorder. Additive effects and normally distributed components of variance in one part of the model (say as com- ponents of the linear predictor q) imply non-additive effects and non-normal compo- nents when modeled in another part (say as a component of 7).

Second, although the normal distribution has been found to fit well to data cov- ering the central part of the distribution of scores on many quantitative traits, for example IQ, it often does less well in the pathological extreme, where the more catas- trophic effects of a few particular genes or chromosomal anomalies may be becoming evident.

Third, in common with other finite mean frailty distributions in non-linear mod- els, some issues arise concerning what should be and what can be identified from information from marginal data, i.e., data from just one member of each twin pair. With a parametric baseline hazard any such frailty distribution with just a single component can be identified from such data provided a measured risk covariate is also included in the model. Clearly it would be undesirable for such a component of frailty to contaminate estimates of shared frailty. Thus, when using such models for twin analysis the use of extended models that included a “specific environment” component of variance would seem sensible. With infinite mean distributions such as the Positive Stable Law family proposed by Hougaard [ 1986a,b] the frailty cannot be identified from marginal data alone.

Anderson et al. [1992] and Hougaard et al. [1992] have illustrated how the choice of distribution also has implications as to how the association between twins varies with age, although the pattern of change also depends on the measure of association used. Overall the PSL models appeared to show decreasing association with age as compared to the gamma model, Although such differences raise interesting scientific hypotheses, the power to discriminate between them will clearly depend substantially on the extent of censoring in the data.

The empirical results obtained from the models proposed in this paper appear to be consistent with the few other available reports. A finding of substantial genetic effects for puberty concurs with the more limited findings of Fishbein [1977], while the largely environmental cause of most differences in childhood antisocial behavior agrees with weak genetic influences found by McGuffin and Gottesman [1985]. The absence of significant effects of early puberty on the onset of such behaviour is in line with the rather slight evidence for such effects found in the review by Archer [1991]. Overall, the application of the proposed survival models to developmental behaviour genetic data shows them as capable of yielding plausible findings.

The computational speed of these models allows scope for proper model explo- ration to be undertaken. This is of particular importance where a variety of hazard functions might seem plausible or where effort also has to be expended in select- ing the right model for the effects of covariates. Our results suggest that the focus of genetic studies on the estimation of shared and unshared random effects should not be allowed to distract the attention of the anlayst from the importance of getting the hazard function right. Misspecification in the hazard function creates problems

Page 15: Survival models for developmental genetic data: Age of onset of puberty and antisocial behavior in twins

Survival Models for Age-of-Onset in Twins 169

in the estimation of the random effects. We have also shown that identifying such misspecification from an examination of the observed and expected marginal distri- butions can be difficult. Extensive model exploration and the use of flexible hazard functions would appear to be necessary.

The proposed models would appear to offer a practical possible alternative to those based on the multivariate normal or other distributions that require estimation using numerical integration [Neale et al., 19891 or those that require Monte Carlo procedures [Clayton, 1991 ; Guo and Thompson, 19921.

ACKNOWLEDGMENTS

The authors thank the anonymous reviewers for their helpful comments on an earlier draft. The VTSABD study is supported by grants MH45268 and MH48064. Thanks are due to Julie Taylor for help in data management.

REFERENCES

Aalen 00, Husbye E (1991): Statistical analysis of repeated events forming renewal processes. Stat Med

Anderson JE, Louis TA, Holm VA, Harvald B (1992): Time-dependent association measures for bivariate

Angold A, Cox A, Prendergast M, Rutter M, Simonoff E (1989): The child and adolescent psychiatric

Archer J (1991): The influence of testosterone on human aggression. Br J Psychiatry 82: 1-28. Clayton D (1978): A model for association in bivariate life-table and its application in epidemiological

Clayton D (1991): A Monte Carlo method for Bayesian inference in frailty models. Biometrics 47:467485. Clayton D, Cuzick J (1985): Multivariate generalizations of the proportional hazards model (with discus-

Cox DR (1972): Regression models and life-tables (with discussion). J R Stat SOC B 34: 187-220. Fishbein S (1977): Onset of puberty in MZ and DZ twins. Acta Genet Med Gemellol (Rome) 26:151-158. Guo SW, Thompson EA (1992): Monte Carlo estimation of variance components models for large complex

Hougaard PA (1986a): Survival models for heterogeneous populations derived from stable law distributions.

Hougaard PA (1986b): A class of multivariate failure time distributions. Biometrika 73:671478. Hougaard PA, Harvald B, Holm NV (1992): Measuring the similarities between the lifetimes of adult

Danish born twins born between 1881-1930. J Am Stat Assoc 87:17-24. Joreskog KG, Sorbum D (1 989): “LISREL 7: A Guide to the Program and Applications,” 2nd Ed. Chicago:

SPSS Inc. Kalbfleisch JD, Prentice RL (1 980): “The Statistical Analysis of Failure Time Data.” New York: John Wiley

and Sons. Kendler KS, Tsuang MT, Hays P (1987): Age at onset in schizophrenia: a familial perspective. Arch Gen

Psychiatry 44538 1-890. Mack W, Langholz B, Thomas DC (1990): Survival models for familial aggregation of cancer. Environ

Health Perspect 44:27-35. McGuffin P, Gottesman I1 (1985): Genetic influences on normal and abnormal development. In Rutter M,

Hersov L (eds): “Child and Adolescent Psychiatry: Modem Approaches,” 2nd Ed. Oxford: Blackwell Scientific Publications, pp 17-33.

Meyer JM, Eaves LJ (1988): Estimating genetic parameters of survival distributions: a multifactorial approach. Genet Epidemiol5:265-275.

NAG (1991): “NAG Fortran Library, Mk 15.” Oxford: Numerical Algorithms Group Ltd.

10:1227-1240.

survival data. J Am Stat Assoc 87:641450.

assessment (CAPA). Unpublished manuscript.

studies of familial tendency in chronic disease. Biometrika 65:141-15 1.

sion). J R Stat SOC A 14832-1 17.

pedigrees. IMA J Appl Med Biol.51: 11 1 1-1 126.

Biometrika 73:387-396.

Page 16: Survival models for developmental genetic data: Age of onset of puberty and antisocial behavior in twins

170 Pickles et al.

Neale MC, Cardon LR (1992): “Methodology for Genetic Studies of Twins and Families.” Dordrecht:

Neale MC, Eaves LJ, Hewitt JK, MacLean CJ, Meyer JM, Kendler, KS (1989): Analyzing the relationship

Oakes D (1982): A model for association in bivariate survival data. J R Stat SOC B 44:414422. Olweus D (1 986): Aggression and hormones: behavioral relationships with testosterone and adrenaline. In

Olweus D, Block J, Radke-Yarrow M (eds): “Development of Anti-social and Prosocial Behavior: Research, Theories and Issues.” New York: Academic Press, pp 51-74.

Pickles A (1983): The analysis of residence histories and other longitudinal panel data: a semi-Markov model incorporating exogeneous variables. Reg Sci Urban Econ 13:271-285.

Prentice R, Gloeckler LA (1978): Regression analysis of grouped survival data with application to breast cancer data. Biometrics 3457-67.

Robins LN (1966): “Deviant Children Grown Up.” Baltimore: Williams and Wilkins. Self SG, Prentice RL (1986): Incorporating random effects into multivariate relative risk regression models.

In Moolgavkar SH, Prentice RL (eds): “Modem Statistical Methods in Chronic Disease Epidemiol- ogy.’’ New York: John Wiley and Sons, pp 167-178.

Wickmaratne PJ, Prusoff BA, Merikangas KR, Weissman MM (1986): The use of survival models with non-proportional hazard functions to investigate age of onset in family studies. J Chronic Dis

Zoccolillo M, Pickles A, Quinton D, Rutter M (1992): The outcome of conduct disorder: implications for

Kluwer.

between age of onset and risk to relatives. Am J Hum Genet 45:226-239.

39:389-397.

defining adult personality disorder. Psycho1 Med 22:97 1-986.