Copula functions and bivariate distributions for survival analysis: An application to political survival Alejandro Quiroz Flores Wilf Department of Politics New York University 19 West 4th St, Second Floor New York, NY 10012-1119 [email protected]July 20, 2008 Abstract Event-history analysis often focuses on the survival time of a single subject. However, recent research in the social sciences demands estimation of the joint survival time of different subjects. This paper presents a method to estimate the interdependence between two different subjects. Analogous to seemingly unrelated regressions (SUR) or bivariate probit models, this paper begins with the assumption that the two different survival processes are not independent. The interdependence between processes is modelled as part of a bivariate distribution suit- able for survival analysis, such as the bivariate exponential and the bivariate Weibull. These bivariate distributions are derived from copula functions. To test the performance of these dis- tributions, the paper presents a simulation experiment. In order to illustrate these methods, the paper presents an application to a new data set on the tenure of leaders and foreign ministers. 1
27
Embed
Copula functions and bivariate distributions for survival ...privateaquiro/AQFcopuladistributions.pdf · Copula functions and bivariate distributions for survival ... This paper presents
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Copula functions and bivariate distributions for survival
The second characteristic of a copula suggests that, in a three-dimensional perspective, the
function is non-decreasing. A two-dimensional function is2-increasingif the volume of a Carte-
sian product in its domain is always greater than or equal to 0. In other words, this means that if
the CDF of the bivariate distribution has a second derivative, then the derivative in respect to the
two margins is greater than or equal to 0. The function isgroundedif its value is equal to 0 at the
minimum value of one of its margins, for all possible values of the other margin. This means that
if the probability of any outcome is 0, that is, if a marginal is equal to 0, then the joint probability
of all outcomes is 0 as well.
Copula functions do not focus on correlation coefficients but on scale invariantmeasures of as-
sociation. It is important to highlight that these measures of association are a function of a measure
of dependence between marginals. Thismeasure of dependence, also known as an association pa-
rameter, is denotedθ. The measure of dependence can take on many different values depending on
the copula, whereas measures of association, such as Pearson’s correlation coefficient, are usually
bounded. In many cases,θ will further constrain a correlation coefficient. This is a serious problem
for some distributions, like Gumbel’s bivariate exponential, which can only handle a correlation
within the [-.25, .25] interval. However, other distributions like the bivariate Weibull allow for
larger correlation coefficients. The next section presents an illustration of the relationship between
the association and correlation coefficients of a bivariate Weibull.
The most well-known invariant measures of association are Kendall’s Tau and Spearman’s Rho.
They are based on the concept of concordance. Two variables are concordant if large values of one
variable are associated with large values of the other variables. The same applies for small values.
Measures of association like Kendall’s Tau and Spearman’s Rho estimate the the probability of
concordance minus the probability of discordance. Equation (3) presents Kendall’s Tau, whereas
6
Equation (4) presents Spearman’s Rho.
τX,Y = 4
∫ ∫I2
C(u, v)dC(u, v)− 1. (3)
ρX,Y = 12
∫ ∫I2
C(u, v)dC(u, v)− 3. (4)
These measures of association have several comparative advantages over typical correlation co-
efficients such as Pearson’s correlation coefficient. As suggested by Trivedi and Zimmer (2005),
linear correlation coefficients cannot measure dependence for non-linear functions of random vari-
ables. In addition, they are not invariant and they are not defined for heavily-tailed distributions.
Given the limitations of a linear correlation coefficient, copula functions focus on other measures
of association such as Kendall’s Tau and Spearman’s Rho.
4 Bivariate Weibull distributions
The previous section has shown that it is possible to construct a bivariate distribution with a copula
function. There are several methods that will produce copula functions. The simplest method
is an equivalent of the inversion method for univariate distributions. If we letF−1 andG−1 be
quasi-inverses2 of F andG, thenC ′ = H[F−1, G−1].
The following equations present two bivariate Weibull distributions.
F (x, y|λx, λy, px, py, θ) = 1− e−( xλx
)px − e−( y
λy)py
+ e−( x
λx)px−( y
λy)py−θ( x
λx)px ( y
λy)py
. (5)
F (x, y|λx, λy, px, py, θ) = [1− e−( xλx
)px][1− e
−( yλy
)py
][1 + θe−( x
λx)px−( y
λy)py
]. (6)
The functions above are based on the following univariate Weibull distributionsF (x) = 1 −
e−( xλx
)pxandG(y) = 1 − e
−( yλy
)py
. Based onC ′ = H[F−1, G−1], it is easy to show that the
2Not all cumulative distribution functions are strictly increasing. When this is the case, they do not have the usualinverse and then the need for a quasi-inverse function. For practical purposes, when the function is strictly increasing,its quasi-inverse is unique and equal to the ordinary inverse.
7
following are the copula functions for Equations (5) and (6) respectively.
C(u, v) = u + v − 1 + [1− u][1− v]e−θ ln (1−u) ln (1−v). (7)
If we setpi = 1 for i = {x, y}, we obtain two bivariate exponential distributions (Gumbel 1960).3
Moreover, the Weibull bivariate distribution of Equation 4–and therefore the bivariate exponential–
is nested in the following Ali-Mikhail-Haq distribution.
C(u, v) =uv
1− θ(1− u)(1− v).
F (x, y|λx, λy, px, py, θ) =[1− e−( x
λx)px
][1− e−( y
λy)py
]
1− θe−( x
λx)px−( y
λy)py . (9)
It is important to note the association parameterθ. In the SUR and the bivariate probit models,
interdependence is captured by the covariance between the disturbances of the different processes.
This covariance, or some other measure of association, is usually reported by statistical software.
In the copula approach, however, the covariance and other measures of association are functions of
θ. This association parameter is central for the estimation of bivariate distributions and it usually
bounds the linear correlation between marginals. As it was mentioned before, the correlation
parameter in Gumbel’s bivariate exponential is severely limited. This is not a significant problem
for the bivariate Weibull. Figure 1 presents the relationship between the association parameter and
the well known correlation parameter. Clearly, the bivariate Weibull allows for a larger correlation
between survival processes, which make it far superior than the bivariate exponential. For this
reason, the remaining of the paper focus on the bivariate Weibull.
3The bivariate exponential version of Equation 4 is also known as the Farlie-Gumbel-Morgenstern distrubution.Gumbel’s copula isC(u, v) = uv[1 + θ(1− u)(1− v)].
8
Figure 1: Association and Correlation Parameters of a Bivariate Weibull
−10 −5 0 5 10
−0.
50.
00.
51.
0
Association and Correlation Parameters
Association Parameter
Pea
rson
Cor
rela
tion
9
5 Maximum likelihood estimation
Suppose that a subject has duration timet1, whereas a second subject has duration timet2. These
are the equivalents ofx andy as used in the previous sections. Now considern pairs of sub-
jects with duration times(t11, t21), (t12, t22), ..., (t1n, t2n). The first subscript denotes the subject
j ∈ 1, 2. The second subscript denotes the ith pair or observation, wherei ∈ 1, 2, ..., n. Further-
more, assume that, conditional on their covariates, thesen observations (or2n duration times) are
independent and identically distributed realizations of the random variablesT1 andT2.
There are several types of observations. First, there are observations whose entire duration
times are observed. Second, there are observations where the duration time of one subject is right-
censored, but the duration time of the other subject is not. This is called univariate censoring
(Lin and Ying 1993; Tsai, Leurgans, and Crowley 1986; Tsai and Crowley 1998). Third, there are
observations whose duration times are right-censored. Having said this, define censoring pointst1,0
for subject 1, andt2,0 for subject 2. Thus, the likelihood has the following components:P (T1 =
Observations that are right-censored contribute to the likelihood with the survivor functionS(t0,1, t0,2) =
P (T1 > t1,0, T2 > t2,0). The survivor function, according to the bivariate functions defined above,
10
is the following.
P (T1 > t1,0, T2 > t2,0) = 1− F (t1,0)− F (t2,0) + F (t1,0, t2,0)
= 1− F (t1,0)− F (t2,0) + F (t1,0)F (t2,0)[1 + α{1− F (t1,0)}{1− F (t2,0)}]
= 1− F (t1,0)− F (t2,0) + F (t1,0)F (t2,0)[1 + αS(t1,0)S(t2,0)]
= S(t1,0)S(t2,0)[1 + αF (t1,0)F (t2,0)] (11)
Now we need to specify the probability distributions. From the copula function of Equation (8)
we can derive a bivariate Weibull and a bivariate exponential. The former was already presented
in Equation (6). As a remainder, the probability distributions are the following.
F (x, y|λx, λy, px, py, θ) = [1− e−( xλx
)px][1− e
−( yλy
)py
][1 + θe−( x
λx)px−( y
λy)py
].
F (x, y|λx, λy, θ) = [1− e−( xλx
)][1− e−( y
λy)][1 + θe
−( xλx
)−( yλy
)].
Clearly, the first function is a bivariate Weibull, whereas the second one is a bivariate exponential,
which is evidently nested in the Weibull. Figure 2 presents the probability density function of a
bivariate Weibull.
With these elements it is now possible to maximize the log-likelihoods of the marginal and the
bivariate distributions. Evidently, the marginal distributions will show estimates of the shape and
scale parameters, but not of the association parameter. The bivariate distribution will show esti-
mates of all parameters, which are all asymptotically normal, thus simplifying the task of testing
a null hypothesis.4 When the association parameter is not significant, then there is no interdepen-
dence between the components. In addition, we can test for zero association between the survival
time of the components with a likelihood ratio (LR) test or a Lagrange multiplier test. Under the
null of θt1,t2 = 0, the model consists of independent distributions, which can be estimated sepa-
rately. For the LR test, we know thatln LUR ≥ ln LR, as a restricted optimum is never superior to
4Most empirical applications of copula functions assume that the association parameterθ is asymptotically normal.However, the range of the parameter actually depends on the particular copula. In some cases the parameter is normallydistributed, but in other cases it could be a positive number of it can lie in an interval. In this paper, the parameter doesbehave as a variable that is normally distributed.
11
Figure 2: Probability Density Function of a Bivariate Weibull
t1
t2z
Bivariate Weibull
12
an unrestricted one. In this case, the sum of the log-likelihoods of the marginals must be smaller
or equal than the log-likelihood of the bivariate model. Thus, the LR statistic, which is distributed
Chi-squared with degrees of freedom equal to the number of restrictions, is given by the following.
LR = −2(ln LR − ln LUR) = −2[(ln Lt1 + ln Lt2)− ln Lt1,t2). (12)
6 Simulation
The two different survival processest1 and t2 depend on some covariates and disturbances. In
survival analysis it is incorrect to assume that these disturbances come from a normal distribution
due to the usual problems of negative duration times and censoring. As a matter of fact, in the event
history models presented in this paper, the central methodological issue resides on the development
of non-normal multivariate distributions and the generation of numbers from those distributions.
The generation of non-normal numbers is of paramount importance because, in practice, dif-
ferent algorithms produce different maximum likelihood estimates. Indeed, there are several tech-
niques to generate numbers from multivariate distributions (Devroye 1986; Johnson 1986; John-
son, Evans, and Green 1999). For instance, Devroye describes more than 5 different algorithms
that generate numbers from a bivariate exponential. Two of Devroye’s procedures based on mix-
tures of univariate exponentials usually create maximization problems. Another procedure based
on multi-normal random variables does not present many maximization problems, but it is difficult
to control the association parameter for simulation purposes. The method to derive numbers from
a bivariate Weibull does not present maximization problems. The procedure, which is also based
on a mixture, is described in Johnson, Evans, and Green (1999).
The simulations consist of 1000 replications. The sample sizes resemble those typically found
in single-record, non-censored survival data, thereby varying N from 100 to a 1000 in increments
of 100. For each replication the experiment generates numbers from a bivariate Weibull according
to specific shape, scale, and association parameters. A brief note on parameterizations is in order.
13
In most event history models, the shape parameterλi is parameterized asλ = exp−(−→xiβ) whereβ
is a vector of parameters to be estimated. This parameterization is also used in the simulation of
this paper. For two different survival processest1 andt2, the experiment simulates data from the
following scale parameters:
λ1 = exp−(β0,1+β1,1X) = exp−(1+.2X) . (13)
λ2 = exp−(β0,2+β1,2Z) = exp−(1+.3Z) . (14)
WhereX andZ independent random variables. Moreover, the shape parameters aspi = 2 for
i = 1, 2. There are two sets of simulations per bivariate distribution. Each set is performed for
a different value of the association parameter. The first set of simulations setsθ = .1, whereas
the second set of simulations assumesθ = .9. Whenθ = .1, the survival processes are highly
interdependent, and whenθ = .9 the processes are close to being independent.
The software used for simulation and estimation is also important in the maximization process.
The simulations in this paper were conducted in R 2.6.0, as this software has powerful and flex-
ible algorithms to maximize what is a very rough likelihood surface. Full maximum likelihood
estimates were found using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. This and
other algorithms are described in Greene (2003). The paper uses this algorithm because the New-
ton and the Nelder-Mead algorithms fail to find the parameters that maximize the log-likelihood.5
Based on this algorithm, each simulation took about 5 hours to be completed. The most complex
simulation takes place when the interdependence between subjects is high.
The procedure for the estimation of the parameters is the following. First, numberst1 andt2 are
generated from a bivariate Weibull as described above. The second step finds provisional estimates
p1.prov and λ1.prov via full maximum likelihood from the marginal distribution oft1. Remember
thatλ1 = exp−(β0,1+β1,1X). Thereby, whenλ1 is estimated, the algorithm actually finds estimates
5The likelihood of the bivariate Weibull presents a rough surface. This feature of the distribution and a high interde-pendence between subjects complicate the maximization even further. Thus, in some specific cases, the maximizationhad to be modified by restricting the association parameter to the interval [0,1]. This type of constrained optimizationin R this is done with the “L-BFGS-B” algorithm. Programming details of the simulation are available upon request.
14
of β0,1 andβ1,1. The same is true forλ2 in the third step of the procedure, where the algorithm
finds provisional estimatesp2.prov and λ2.prov from the marginal distribution oft2. The starting
point for a shape parameterpi.prov is 1 for i = 1, 2, whereas the starting point the scale parameter
λi.prov is the mean ofX andZ for i = 1 andi = 2 respectively. Fourth, and having found the
provisional parametersp1.prov, λ1.prov, p2.prov, andλ2.prov from the marginals, the procedure plugs
those values in the target function and finds the maximum likelihood estimate (MLE) of a provi-
sional association parameter calledθprov. This is a one dimensional search where the starting point
is Spearman’s correlation coefficient betweent1 andt2. Finally, all these provisional parameters
are used as starting points for the final estimation of all five parameters of the bivariate distribution,
that is,p1, λ1, p2, λ2, andθ.
The following figures present simulation results. Figure 3 presents the root mean squared error
(RMSE) of the estimates of the first component of the scale parameterλ1, that is,β0,1 for θ = .1
andθ = .9. In other words, the left panel of figure 3 comparesβ01.prov with β01 for a strong level
of interdependence between the survival processes, whereas the right panel comparesβ01.prov with
β01 for a weak level of interdependence. Likewise, figure 4 presents the RMSE of the estimates of
the other component ofλ1, that is,β1,1. The results are symmetric forβ0,2 andβ1,2.
The results from simulation are enlightening. First, for cases of strong interdependence be-
tween survival processes, the RMSE of the parameters from the bivariate distribution are smaller
than the RMSE of the parameters from the univariate distribution. In addition, the parameters from
the bivariate distribution are also more efficient than the parameters from the univariate distribu-
tions. In cases of weak interdependence between processes, the RMSE of the parameters from
the bivariate and univariate distributions are practically identical, and in some cases the RMSE of
the parameters from the bivariate Weibull are slightly smaller.Given the simplicity in estimating
a bivariate Weibull, and regardless of the degree of interdependence, it is recommended that the
parameters from this distribution are chosen over the parameters of a univariate Weibull.This
recommendation does not change as the sample size gets larger: the RMSE for both the univariate
15
Figure 3: RMSE ofβ01 for θ = .1 andθ = .9
16
Figure 4: RMSE ofβ11 for θ = .1 andθ = .9
17
and the bivariate estimates are reduced by a large sample size, and the RMSE of the parameters
from the bivariate distribution remain smaller or equal than the RMSE of the parameters from the
univariate distribution.
The improvement in estimating the parameters of the bivariate distribution probably comes
from the better use of information regarding the association parameter. Indeed, only by estimat-
ing a bivariate distribution is it possible to know the strength of the interdependence between two
survival processes. This is a key finding, as the calculation of estimated probabilities, mean, and
median duration times, depends on the value of the association parameter. Moreover, as it was
mentioned previously in this paper, measures of association such as Pearson’s correlation coeffi-
cient, Kendall’s Tau, and Spearman’s Rho are also functions of this association parameter.
The derivation of the moments of the bivariate Weibull presented above is is not the focus
of this paper. However, Gumbel (1960) has presented the moments of a bivariate exponential,
whereas Johnson, Evans, and Green (1999), as well as Hays and Kachi (2008), have described
the moments of a particular bivariate Weibull. The real methodological challenge resides on the
estimation of the association parameter. This paper shows that the estimation of the association
parameter does not present significant problems if the appropriate algorithm and software are used.
The next section presents an application of these methods to a real data set.
7 Application: The joint survival time of leaders and foreign
ministers
In order to illustrate the use of a bivariate Weibull distribution, this paper analyzes the joint survival
of leaders and foreign ministers. During the last decade, the survival of leaders has been the
focus of extensive investigations (Bueno de Mesquita and Siverson 1995; Bueno de Mesquita,
Siverson, and Woller, 1992; Bueno de Mesquita et al. 2003; Chiozza and Goemans, 2003 and
2004; Goemans, 2000a and 2000b). However, not much research has been conducted on the
18
survival of other politicians in government, and even less on how the survival of one affects the
survival of the others (Berlinski, Dewan, and Myatt 2007; Dewan and Myatt 2005 and 2007).
In previous papers I contributed to this research agenda by developing and testing hypotheses
on the determinants of the tenure of foreign ministers. The evidence shows that although political
institutions have a significant impact on the tenure of foreign ministers, internal coalition dynamics
such as affinity and loyalty towards a leader, uncertainty, and time dependence are better predictors
of their political survival. Indeed, that investigation demonstrates that the survival of a leader has
a very significant impact on the survival of a foreign minister.
Nevertheless, it could be the case that the survival of a minister also has an impact on the
survival of a leader. Berlinski, Dewan, and Dowding (2007) and Dewan and Myatt (2005 and
2007) show that ministerial resignations in democratic, parliamentary systems do have a corrective
effect on the survival of a government. In addition, it is possible that external shocks could affect
the tenure of both leaders and ministers at the same time. This suggests that the survival times
of leaders and ministers are interdependent. Testing this hypothesis presents an ideal case for the
application of the methods developed in previous sections of this paper.
In order to test this hypothesis, this paper uses data on the tenure of leaders and foreign minis-
ters.6 The data set on foreign ministers constitutes the first systematic and entirely functional code
of the tenure of most foreign ministers for the last three centuries. The data set identifies 7,428
foreign ministers in 181 countries spanning the years 1696-2004, and includes the specific day,
month, and year in which 4,926 ministers took and left office. For the remaining 2,502 ministers,
only the years in which they took and left office were recorded. Ministers holding office up to
2004, as well as ministers from countries that disappeared, were recorded as right-censored.7 The
specific data used in estimation are for the ministers whose day, month, and year of taking and
6The data base on leaders is used by Bueno de Mesquita et al. (2003) and is publicly available athttp://www.nyu.edu/gsas/dept/politics/data/bdm2s2/Logic.htm.
7Up to this point there is no reliable information about the resignations of these foreign ministers. Thus, it isassumed that, if the ministers are not right-censored, they fail. Although ministers do resign from their positions, itis reasonable to assume that in general they try to stay in office for as long as possible. I believe it is better to testshypotheses with crude data than not to test them at all.
19
leaving office are known. These data include 4,420 foreign ministers in 156 countries from 1785
to 2000. In order to create this data set, all the ministers whose specific day, month, and year of
taking and leaving office are not known were dropped from the initial data set. In spite of this, the
sample used in estimation is still quite large.
In general, the data base would be organized as multiple-record data. In other words, there
would be a line of data for each year a leader and a minister hold office. This would capture
many time-varying covariates. However, this type of organization presents important challenges
for estimation. Therefore, this paper organizes the data as single-record data. There are two
dependent variables: the total survival time of a leader and the median survival time of ministers
that held office with that particular leader. For instance, if a leader lasted 9 years in office and had
3 ministers who held office for 2, 3, and 4 years respectively; the first dependent variable would be
equal to 9, whereas the second dependent variable would be equal to 3.8 Table 1 presents summary
statistics of the survival time of a leader, the median survival time of ministers, and the mean
failure of ministers by leader (Change in minister). This last variable is the main covariate used in
estimation. For instance, in the case of the leader that lasted 9 years in office, 3 ministers occupied
office as well. In those 9 years, 3 ministers failed. In this case, the variable would be equal to .3.
This means thatChange in ministercaptures the rate of minister change by year. The larger this
variable is, the more ministers have occupied office during the tenure of a particular leader.
Table 1: Summary statistics: yearsVariable N Mean Variance
Duration Leaders 1966 3.835 34.079Duration Median Ministers 1966 2.023 8.791
Change in minister 1966 .3583 .0955
Table 2 presents estimation results for the duration time of leaders and foreign ministers re-
spectively. The survival time of leaders depends on ministerial change, whereas the survival time
8There are other alternatives for data organization. Yet, given the current technology, this format is probably thebest way of analyzing the survival time of these two actors. Once the likelihood includes time-varying covariates,there will be no need to organize data according to arbitrary decisions.
20
of ministers depends only on an intercept.9 The survival time of both leaders and ministers also
depends on the association parameterθ. The table displays full maximum likelihood estimates
from the univariate and the bivariate Weibull distributions. The results are presented in an acceler-
ated failure time form. This means that a positive coefficient reflects an increase in survival time,
whereas a negative coefficient reflects a decrease in survival time. Standard errors are presented
below coefficients.
Table 2: The Joint Tenure of Leaders and Foreign MinistersModel Mginal Leaders Biv Leaders Mginal Ministers Biv Minsters
Change minister -1.627*** -.6725***(.1222) (.1358)