-
Maximum Likelihood Estimation for
Type I Censored Weibull Data
Including Covariates
Fritz Scholz∗
Mathematics & Computing TechnologyBoeing Phantom Works
August 22, 1996
Revised January 5, 20011
∗The author wishes to thank James King for several inspiring
conversation concerning the unique-ness theorem presented in
Section 2, Siavash Shahshahani for showing him more than 30 years
agothat it follows from Morse Theory, John Betts for overcoming
programming difficulties in usingHDNLPR, and Bill Meeker for
helpful comments on computational aspects.
1The revision mainly corrects misprints, some notational
inconsistencies, and indicates the newcalling sequence for
compiling the Fortran program
-
Abstract
This report deals with the specific problem of deriving maximum
likeli-hood estimates of the regression model parameters when the
residual errorsare governed by a Gumbel distribution. As an
additional complication the ob-served responses are permitted to be
type I or multiply censored. Since thelog-transform of a
2-parameter Weibull random variable has a Gumbel distri-bution, the
results extend to Weibull regression models, where the log of
theWeibull scale parameter is modeled linearly as a function of
covariates. In theWeibull regression model the covariates thus act
as multiplicative modifiers ofthe underlying scale parameter.
A general theorem for establishing a unique global maximum of a
smoothfunction is presented. The theorem was previously published
by Mäkeläinen etal. (1981) with a sketch of a proof. The proof
presented here is much shorterthan their unpublished proof.
Next, the Gumbel/Weibull regression model is introduced together
with itscensoring mechanism. Using the above theorem the existence
and uniqueness ofmaximum likelihood estimates for the posed
specific Weibull/Gumbel regressionproblem for type I censored
responses is characterized in terms of sufficient andeasily
verifiable conditions, which are conjectured to be also
necessary.
As part of an efficient optimization algorithm for finding these
maximumlikelihood estimates it is useful to have good starting
values. These are foundby adapting the iterative least squares
algorithm of Schmee and Hahn (1979)to the Gumbel/Weibull case.
FORTRAN code for computing the maximumlikelihood estimates was
developed using the optimization routine HDNLPR.Some limited
experience of this algorithm with simulated data is presented
aswell as the results to a specific example from Gertsbakh
(1989).
-
1 Introduction
In the theory of maximum likelihood estimation it is shown,
subject to regularityconditions, that the likelihood equations have
a consistent root. The problems thatarise in identifying the
consistent root among possibly several roots were discussedby
Lehmann (1980). It is therefore of interest to establish, whenever
possible, thatthe likelihood equations have a unique root. For
example, for exponential familydistributions it is easily shown,
subject to mild regularity conditions, that the log-likelihood
function is strictly concave which in turn entails that the
log-likelihoodequations have at most one root. However, such global
concavity cannot always beestablished. Thus one may ask to what
extent the weaker property of local concavityof the log-likelihood
function at all roots of the likelihood equations implies thatthere
can be at most one root. Uniqueness arguments along these lines,
althoughincomplete, may be found in Kendall and Stuart (1973, p.
56), Turnbull (1974), andCopas (1975), for example.
However, it also was pointed out by Tarone and Gruenhage (1975)
that a function oftwo variables may have an infinity of strict
local maxima and no other critical points,i.e. no saddle points or
minima. To resolve this issue, a theorem is presented whichis well
known to mathematicians as a special application of Morse Theory,
cf. Milnor(1963) and also Arnold (1978) p. 262. Namely, on an
island the number of minimaminus the number of saddle points plus
the number of maxima is always one. Thespecialization of the
theorem establishing conditions for a unique global maximumwas
first presented to the statistical community by Mäkeläinen et al.
(1981). SinceMorse Theory is rather deep and since Mäkeläinen et
al. only give an outline ofa proof, leaving the lengthy details to
a technical report, a short (one page) andmore accessible proof is
given here. It is based on the elementary theory of
ordinarydifferential equations.
It is noted here that although Mäkeläinen et al. have priority
in publishing thetheorem presented here, a previous version of this
paper had been submitted forpublication, but was withdrawn and
issued as a technical report (Scholz, 1981), whenthe impending
publication of Mäkeläinen et al. became known. Aside from
thesetwo independent efforts there was a third by Barndorff-Nielsen
and Blæsild (1980),similarly preempted, which remained as a
technical report. Their proof of the resultappears to depend on
Morse Theory. Similar results under weaker assumptions maybe found
in Gabrielsen (1982, 1986). Other approaches, via a multivariate
version ofRolle’s theorem were examined in Rai and van Ryzin
(1982).
3
-
2 The Uniqueness Theorem
In addition to the essential strict concavity at all critical
points the uniqueness the-orem invokes a compactness condition
which avoids the problems pointed out byTarone and Gruenhage (1975)
and which are illustrated in Figure 1. The theoremcan be stated as
follows:
Theorem 1 Let G be an open, connected subset of Rn and let f : G
−→ R betwice continuously differentiable on G with the following
two properties:
i) For any x ∈ G with grad f(x) = 0 the Hessian D2f(x) is
negative definite, i.e.all critical points are strict local
maxima.
ii) For any x ∈ G the set {y ∈ G : f(y) ≥ f(x)} is compact.
Then f has exactly one critical point, hence one global maximum
and no other localmaxima on G.
Proof: By i) all critical points are isolated, i.e. for each
critical point x ∈ G of fthere exists and �(x) > 0 such that
B�(x)(x) = {y ∈ Rn : |y − x| < �(x)} ⊂ G
contains no other critical point besides x, and such that
g(x)def= sup
{f(y) : y ∈ ∂B�(x)(x)
}< f(x) .
Let
Ud(x)(x) ={y ∈ B�(x)(x) : f(y) > f(x)− d(x)
}with 0 < d(x) < f(x)−g(x), then ∂Ud(x)(x) ⊂ B�(x)(x)
(note that f(y) = f(x)−d(x)for y ∈ ∂Ud(x)(x)). Consider now the
following vector function
h(z) = grad f(z) · | grad f(z) |−2
which is well defined and continuously differentiable on G−C,
where C is the set ofcritical points of f in G. Hence the
differential equation ż(t) = h(z(t)) with initial
4
-
condition z(0) = z0 ∈ G− C has a unique right maximal solution
z(t; 0, z0) on someinterval [0, t0), t0 > 0, see Hartman (1964),
pp. 8-13. Note that f(z(t; 0, z0)) =f(z0) + t for t ∈ [0, t0). It
follows from ii) that t0 must be finite. Consider now thefollowing
compact set:
K = {y ∈ G : f(y) ≥ f(z0)} −⋃
x∈CUd(x)(x) .
Then z(t; 0, z0) �∈ K for t near t0, see Hartman pp. 12-13.
Hence for t near t0z(t; 0, z0) ∈ Ud(x)(x) for some x ∈ C. From the
construction of Ud(x)(x) it is clear thatonce such a solution
enters Ud(x)(x) it will never leave it. For x ∈ C let P (x) be
theset set containing x and all those points z0 ∈ G − C whose
solutions z(t; 0, z0) willwind up in Ud(x)(x). It has been shown
that {P (x) : c ∈ C} forms a partition of G.Since z(t; 0, z0) is a
continuous function of z0 ∈ G− C, see Hartman p. 94, it followsthat
each P (x), x ∈ C is open. Since G is assumed to be connected,
i.e., G cannotbe the disjoint union of nonempty open sets, one
concludes that all but one of theP (x), x ∈ C, must be empty.
Q.E.D.Remark: It is clear that a disconnected set G allows for easy
counterexamples ofthe theorem. Assumption ii) is violated in the
example presented by Tarone andGruenhage: f(x, y) = − exp(−2y)−
exp(−y) sin(x). Figure 1 shows the contour linesof f(x, y) in the
upper plot and the corresponding perspective in the lower plot.
Inthicker line width is indicated the contour f(x, y) = 0, given by
y = − log(− sin(x))over the intervals where sin(x) < 0. This
latter contour is unbounded since y → ∞as sin(x) → 0 at those
interval endpoints. Thus the level set {(x, y) : f(x, y) ≥ 0}is
unbounded. What is happening in this example is that there are
saddle points atinfinity which act as the connecting agent between
the local maxima.
Assumption ii) may possibly be replaced by weaker assumptions;
however, it appearsdifficult to formulate such assumptions without
impinging on the simplicity of theoremand proof. The following
section will illustrate the utility of the theorem in thecontext of
censored Weibull data with covariates. However, it should be noted
thatmany other examples exist.
5
-
Figure 1: Contour and Perspective Plots
of f(x, y)= − exp(−2y) − exp(−y) sin(x)
x
y
0 2 4 6 8 10 12
-0.5
0.0
0.5
1.0
1.5
-3 -3-2.5 -2.5 -2.5-2
-2
-2
-2
-2-1.5 -1.5
-1.5
-1.5
-1
-0.5
-0.25 -0.25
-0.25
-0.25
-0.1 -0.1
-0.1
0 00.1 0.1
0.20.2
0.240.24
• •
24
68
1012
X 0
0.5
1
Y
-4-3
-2-1
01
Z
6
-
3 Weibull Regression Model Involving Censored Data
Consider the following linear model:
yi =p∑
j=1
uijβj + σ�i = u′iβ + σ�i i = 1, . . . , n
where �1, . . . , �n are independent random errors, identically
distributed according tothe extreme value or Gumbel distribution
with density f(x) = exp[x − exp(x)] andcumulative distribution
function F (x) = 1 − exp[− exp(x)]. The n × p matrix U =(uij) of
constant regression covariates is assumed to be of full rank p,
with n > p.The unknown parameters σ, β1, . . . , βp will be
estimated by the method of maximumlikelihood, which here is taken
to be the solution to the likelihood equations.
The above model can also arise from the following Weibull
regression model:
P (Ti ≤ t) = 1− exp(−
[t
α(ui)
]γ )
which, after using the following log transformation Yi =
log(Ti), results in
P (Yi ≤ y) = 1− exp[− exp
(y − log[α(ui)]
(1/γ)
)]= 1− exp
[− exp
(y − µ(ui)
σ
)].
Using the identifications σ = 1/γ and µ(ui) = log[α(ui)] = u′iβ
this reduces to the
previous linear model with the density f .
Rather than observing the responses yi completely, the data are
allowed to be cen-sored, i.e., for each observation yi one either
observes it or some censoring time ci.The response yi is observed
whenever ci ≥ yi and otherwise one observes ci, and oneknows
whether the observation is a yi or a ci. One will also always know
the corre-sponding covariates uij, j = 1, . . . , p for i = 1, . .
. , n. Such censoring is called type Icensoring or, since the
censoring time points ci can take on multiple values, one
alsospeaks of multiply censored data. Thus the data consist of
S = {(x1, δ1,u1), . . . , (xn, δn,un)} ,
where xi = yi and δi = 1 when yi ≤ ci, and xi = ci and δi = 0
when yi > ci. Thenumber of uncensored observations is denoted by
r =
∑ni=1 δi and the index sets of
uncensored and censored observations by D and C, respectively,
i.e.,
7
-
D = {i : δi = 1, i = 1, . . . , n} = {i1, . . . , ir} and C = {i
: δi = 0, i = 1, . . . , n} .
Furthermore, denote the uncensored observations and
corresponding covariates by
yD =
yi1...yir
and UD =
u′i1...
u′ir
.The likelihood function of the data S is
L(β, σ) =∏i∈D
1
σexp
[xi − u′iβ
σ− exp
(xi − u′iβ
σ
)]∏i∈C
exp
[− exp
(xi − u′iβ
σ
)]
and the corresponding log-likelihood is
)(β, σ) = log[L(β, σ)]
=∑i∈D
[xi − u′iβ
σ− exp
(xi − u′iβ
σ
)]− ∑
i∈Dlog σ − ∑
i∈Cexp
(xi − u′iβ
σ
).
3.1 Conditions for Unique Maximum Likelihood Estimates
Here conditions will be stated under which the maximum
likelihood estimates of βand σ exist and are unique. It seems that
this issue has not yet been addressed inthe literature although
software for finding the maximum likelihood estimates existsand is
routinely used. Some problems with such software have been
encounteredand situations have been discovered in which the maximum
likelihood estimates,understood as roots of the likelihood
equations
∂)(β, σ)
∂σ= 0 and
∂)(β, σ)
∂βj= 0 , for j = 1, . . . , p , (1)
do not exist. Thus it seems worthwhile to explicitly present the
conditions whichguarantee unique solutions to the likelihood
equations. These conditions appear to be
8
-
reasonable and not unduely restrictive. In fact, it is
conjectured that these conditionsare also necessary, but this has
not been pursued.
Theorem 2 Let r ≥ 1 and the columns of UD be linearly
independent. Then foryD not in the column space of UD or for yD =
UDβ̂ for some β̂ with xi > u
′iβ̂ for
some i ∈ C the likelihood equations (1) have a unique solution
which represents thelocation of the global maximum of )(β, σ) over
Rp × (0,∞).Comments: The above assumption concerning UD is stronger
than assuming thatthe columns of U be linearly independent. Also,
the event that yD is in the columnspace of UD technically has
probability zero if r > p, but may occur due to roundingor data
granularity problems.
When r ≥ p and yD = UDβ̂ with xi ≤ u′iβ̂ for all i ∈ {C}, it is
easily seen that)(β̂, σ) → ∞ as σ → 0. From the point of view of
likelihood maximization this wouldpoint to (β̂, σ̂) = (β̂, 0) as
the maximum likelihood estimates, provided one extendsthe
permissible range of σ from (0,∞) to [0,∞). However, the
conventional largesample normality theory does not apply here,
since it is concerned with the roots ofthe likelihood
equations.
The additional requirement xi > u′iβ̂ for some i ∈ C gives
the extra information that
is needed to get out of the denenerate case, namely the linear
pattern yD = UDβ̂,because the actual observation yi implied by the
censored case xi > u
′iβ̂ will also
satisfy that inequality since yi > xi and thus break the
linear pattern and yield aσ̂ > 0. This appears to have been
overlooked by Nelson (1982) when on page 392 hesuggests that when
estimating k parameters one should have at least k distinct
failuretimes, otherwise the estimates do not exist. Although his
recommendation was madein a more general context it seems that the
conditions of Theorem 2 may have somebearing on other situations as
well.
Proof: First it is shown that any any critical point (β, σ) of )
is a strict localmaximum. In the process the equations resulting
from grad )(β, σ) = 0 are used tosimplify the Hessian or matrix of
second derivatives of ) at those critical points. Thissimplified
Hessian is then shown to be negative definite. The condition grad
)(β, σ) =0 results in the following equations:
∂)
∂σ= − r
σ− ∑
i∈D
xi − u′iβσ2
+n∑
i=1
xi − u′iβσ2
exp
(xi − u′iβ
σ
)
9
-
= − rσ− 1σ
[ ∑i∈D
zi −n∑
i=1
zi exp(zi)
]= 0 (2)
with zi = (xi − u′iβ)/σ and
∂)
∂βj= − 1
σ
[ ∑i∈D
uij −n∑
i=1
uij exp
(xi − u′iβ
σ
) ]
= − 1σ
[ ∑i∈D
uij −n∑
i=1
uij exp (zi)
]= 0 for j = 1, . . . , p . (3)
The Hessian or matrix H of second partial derivatives of ) with
respect to (β, σ) ismade up of the following terms for 1 ≤ j, k ≤
p:
∂2)(β, σ)
∂βj∂βk= − 1
σ2
n∑i=1
uijuik exp(zi) (4)
∂2)(β, σ)
∂βj∂σ=
1
σ2
[ ∑i∈D
uij −n∑
i=1
uij exp(zi)−n∑
i=1
uijzi exp(zi)
](5)
∂2)(β, σ)
∂σ2=
1
σ2
[r + 2
∑i∈D
zi − 2n∑
i=1
zi exp(zi)−n∑
i=1
z2i exp(zi)
](6)
From (2) one gets
n∑i=1
zi exp(zi)−∑i∈D
zi = r
and one can simplify (6) to
∂2)(β, σ)
∂σ2=
r
σ2− 2rσ2
− 1σ2
n∑i=1
z2i exp(zi) = −1
σ2
[r +
n∑i=1
z2i exp(zi)
]
10
-
Using (3) one can simplify (5) to
∂2)(β, σ)
∂βj∂σ= − 1
σ2
n∑i=1
ziuij exp(zi) .
Thus the matrix H of second partial derivatives of ) at any
critical point is
H = − 1σ2
∑ni=1 exp(zi)uiu′i ∑ni=1 zi exp(zi)ui∑ni=1 zi exp(zi)u
′i r +
∑ni=1 z
2i exp(zi)
= − 1σ2
B
Letting wi = exp(zi)/∑n
j=1 exp(zj) and W =∑n
j=1 exp(zj) one can write
B =W
∑ni=1wiuiu′i ∑ni=1wiziui∑ni=1wiziu
′i r/W +
∑ni=1wiz
2i
.In this matrix the upper p×p left diagonal submatrix
∑ni=1wiuiu′i is positive definite.This follows from
a′n∑
i=1
wiuiu′ia =
n∑i=1
wi|a′ui|2 > 0
for every a ∈ Rp − {0}, provided the columns of U are linearly
independent, whichfollows from our assumption about UD. The lower
right diagonal element r +W
∑ni=1wiz
2i of B is positive since r ≥ 1.
The last step in showing B to be positive definite is to verify
that det(B) > 0. Tothis end let
V =
(n∑
i=1
wiuiu′i
)−1
and note that for r > 0 one has
det(B) = W det
(n∑
i=1
wiuiu′i
)
× det[r/W +
n∑i=1
wiz2i −
n∑i=1
wiziu′iV
n∑i=1
wiziui
]> 0
11
-
since
0 ≤n∑
i=1
wi
zi − u′iV n∑j=1
wjzjuj
2
=n∑
i=1
wiz2i − 2
n∑i=1
wiziu′iV
n∑j=1
wjzjuj +n∑
i=1
wiu′iV
n∑j=1
wjzjuj u′iV
n∑j=1
wjzjuj
=n∑
i=1
wiz2i − 2
n∑i=1
wiziu′iV
n∑j=1
wjzjuj +n∑
i=1
win∑
j=1
wjzju′jV uiu
′iV
n∑j=1
wjzjuj
=n∑
i=1
wiz2i − 2
n∑i=1
wiziu′iV
n∑j=1
wjzjuj +n∑
j=1
wjzju′jV
n∑i=1
wiuiu′iV
n∑j=1
wjzjuj
=n∑
i=1
wiz2i −
n∑i=1
wiziu′iV
n∑i=1
wiziui .
To claim the existence of unique maximum likelihood estimates it
remains to demon-strate the compactness condition ii) of Theorem 1.
It will be shown that
a) )(β, σ) −→ −∞ uniformly in β ∈ Rp as σ → 0 or σ → ∞ andb) for
any � > 0 and � ≤ σ ≤ 1/� one has
sup{)(β, σ) : |β| ≥ ρ} −→ −∞ as ρ→ ∞.
Compact sets in Rp+1 are characterized by being bounded and
closed. Using thecontinuous mapping ψ(β, σ) = (β, log(σ)) map the
half space K+ = Rp× (0,∞) ontoRp+1. According to Theorem 4.14 of
Rudin (1976) ψ maps compact subsets of K+
into compact subsets of Rp+1, the latter being characterized as
closed and bounded.This allows the characterization of compact
subsets in K+ as those that are closedand for which |β| and σ are
bounded above and for which σ is bounded away fromzero.
Because of the continuity of )(β, σ) the set
Q0 = {(β, σ) : )(β, σ) ≥ )(β0, σ0)}
12
-
is closed and bounded and bounded away from the hyperplane σ =
0. These bounded-ness properties of Q0 are seen by contradiction.
If Q0 did not have these properties,then there would be a sequence
(βn, σn) with either σn → 0 or σn → ∞ or with0 < � < σn <
1/� and |βn| → ∞. For either of these two cases the above claimsa)
and b) state that )(βn, σn) → −∞ which violates )(βn, σn) ≥ )(β0,
σ0). Thiscompletes the main argument of the proof of Theorem 2,
subject to demonstratingthe claims a) and b).
To see a) first deal with the case in which yD is not in the
column space of UD. Thisentails that for all β ∈ Rp
|yD − UDβ| ≥ |yD − UDβ̂| = κ > 0 where β̂ = [U ′DUD]−1 U ′DyD
.
Thus max{|xi −u′iβ| : i ∈ D} ≥ κ̃ > 0 uniformly in β ∈ Rp
and, using the inequalityx− exp(x) ≤ −|x| for all x ∈ R, one
has
∑i∈D
[xi − u′iβ
σ− exp
(xi − u′iβ
σ
)]≤ −∑
i∈D
∣∣∣∣∣xi − u′iβσ∣∣∣∣∣ (7)
≤ − max{|xi − u′iβ| : i ∈ D}σ
≤ − κ̃σ−→ −∞ ,
as σ → 0, and thus uniformly in β ∈ Rp
)(β, σ) ≤ −r log(σ)− κ̃σ−→ −∞ as σ → 0 .
To deal with the other case, where UDβ̂ = yD and xi > u′iβ̂
for some i ∈ C, take a
neighborhood of β̂
Bρ(β̂) ={β : |β − β̂| ≤ ρ
}with ρ > 0 chosen sufficiently small so that
|u′iβ − u′iβ̂| ≤xi − u′iβ̂
2for all β ∈ Bρ(β̂).
This in turn implies
13
-
xi − u′iβ = xi − u′iβ̂ + u′iβ̂ − u′iβ
≥ xi − u′iβ̂ −xi − u′iβ̂
2=
xi − u′iβ̂2
for all β ∈ Bρ(β̂). For some κ′ > 0 and for all β �∈ Bρ(β̂)
one has |yD − UDβ| ≥ κ′.Bounding the first term of the likelihood
)(β, σ) as in (7) for all β �∈ Bρ(β̂) andbounding the last term of
the likelihood by
− exp(xi − u′iβ
σ
)≤ − exp
(xi − u′iβ̂
2σ
)for all β ∈ Bρ(β̂)
one finds again that either of these bounding terms will
dominate the middle term,−r log σ, of )(β, σ) as σ → 0. Thus again
uniformly in β ∈ Rp one has )(β, σ) → −∞as σ → 0.As for σ → ∞ note
x− exp(x) ≤ −1 and one has
)(β, σ) ≤ −r log(σ)− r −→ −∞ as σ → ∞uniformly in β ∈ Rp. This
establishes a).Now let � ≤ σ ≤ 1/�. From our assumption that the
columns of UD are linearlyindependent it follows that
inf {|UDβ| : |β| = 1} = m > 0where m2 is the smallest
eigenvalue of U ′DUD. Thus for all β ∈ Rp
|UDβ − yD| ≥ |UDβ| − |yD| ≥ m|β| − |yD| ,
and using the inequality∑ |xi| ≥ √∑x2i one has
)(β, σ) ≤ −r log(σ)− ∑i∈D
∣∣∣∣∣xi − u′iβ
σ
∣∣∣∣∣ ≤ −r log(σ)− |UDβ − yD|σ≤ − r log(σ)− m|β| − |yD|
σ−→ −∞ as |β| −→ ∞ ,
again uniformly in |β| ≥ K, with K −→ ∞. This concludes the
proof of Theorem 2.
14
-
4 Solving the Likelihood Equations
The previous section showed that the solution to the likelihood
equations is uniqueand coincides with the unique global maximum of
the likelihood function. This sectiondiscusses some computational
issues that arise in solving for these maximum likelihoodestimates.
One can either use a multidimensional root finding algorithm to
solve thelikelihood equations or one can use an optimization
algorithm on the likelihood orlog-likelihood function. It appears
that in either case one can run into difficultieswhen trying to
evaluate the exponential terms exp([xi − u′iβ]/σ). Depending on
thechoice of σ and β this term could easily overflow and terminate
all further calculation.Such overflow leads to a likelihood that is
practically zero, indicating that σ and βare far away from the
optimum. It seems that this problem is what troubles thealgorithm
survreg in S-PLUS. In some strongly censored data situations
survregsimply crashes with overflow messages. One such data set is
given in Table 1 with adagger indicating the three failure times.
The histogram of this data set is given inFigure 2 with the three
failure cases indicated by dots below the histogram.
Table 1: Heavily Censored Sample
626.1 651.7 684.7 686.3 698.2 707.7 709.8 714.7 718.0 719.6720.9
721.9 726.7 740.3 752.9 760.3 764.0 764.8 768.3 773.6774.4 784.1
785.3 788.9 790.3 793.2 794.0 806.1 816.2 825.0826.5 829.8† 832.3
839.4 840.5 843.1 845.2 849.1 849.2† 856.2856.8 859.1 868.9† 869.3
881.1 887.8 890.5 898.2 921.5 934.8
In the case of simple Weibull parameter estimation without
covariates this overflowproblem can be finessed in the likelihood
equations by rewriting these equations sothat the exponential terms
only appear simultaneously in numerator and denominatorof some
ratio, see equation (4.2.2) in Lawless (1982). One can then use a
commonscaling factor so that none of the exponential terms
overflow.
In the current case with covariates it appears that this same
trick will not work.Thus it is proposed to proceed as follows. Find
a starting value (β0, σ0) by way of theSchmee-Hahn regression
algorithm presented below. It is assumed that the startingvalue
will not suffer from the overflow problems mentioned before.
Next, employ an optimization algorithm that allows for the
possibility that the func-tion to be optimized may not be able to
return a function value, gradient or Hessian
15
-
Figure 2: Histogram for Data Table 1
650 700 750 800 850 900 950
02
46
8
cycles
•• •
at a desired location. In that case the optimization algorithm
should reduce its stepsize and try again. The function box which
calculates the function value, gradientand Hessian should take care
in trapping exponential overflow problems, i.e., statewhen they
cannot be resolved. These problems typically happen only far away
fromthe function optimum where the log-likelihood drops off to
−∞.Another precaution is to switch from σ to α = log(σ) in the
optimization process.Furthermore, it was found that occasionally it
was useful to rescale σ, xi and uij bya common scale factor so that
σ is in the vicinity of one. This is easily done usingthe
preliminary Schmee-Hahn estimates.
Optimization algorithms usually check convergence based on the
gradient (amongother criteria) and the gradient is proportional to
the scale of the function to beoptimized. Thus it is useful to
rescale the log-likelihood to get its minimum value intoa proper
range, near one. This can be done approximately by evaluating the
absolutevalue of the log-likelihood at the initial estimate and
rescale the log-likelihood functionby dividing by that absolute
value.
16
-
4.1 Schmee-Hahn Regression Estimates with Censored Data
This method was proposed by Schmee and Hahn (1979) as a simple
estimation methodfor dealing with type I censored data with
covariates. It can be implemented by usinga least squares algorithm
in iterative fashion.
We assume the following regression model
Yi = β1Xi1 + . . .+ βpXip + σei , i = 1, . . . , n
or in vector/matrix notationY1...Yn
=X11 . . . X1p...
. . ....
Xn1 . . . Xnp
β1...βp
+ σe1...en
or more compactly
Y = Xβ + σe .
Here Y is the vector of observations, X is the matrix of
covariates corresponding toY , β is the vector of regression
coefficients, and σe is the vector of independent andidentially
distributed error terms with E(ei) = 0 and var(ei) = 1. We denote
thedensity of e by g0(z). Often one has Xi1 = 1 for i = 1, . . . ,
n. In that case the modelhas an intercept.
Rather than observing this full data set (Y ,X) one observes the
Yi in partiallycensored form, i.e., there are censoring values c′ =
(c1, . . . , cn) such that Yi is observedwhenever Yi ≤ ci,
otherwise the value ci is observed. Also, it is always known
whetherthe observed value is a Yi or a ci. This is indicated by a
δi = 1 and δi = 0, respectively.Thus the observed censored data
consist of
D = (Ỹ ,X, δ)where δ′ = (δ1, . . . , δn) and Ỹ
′= (Ỹ1, . . . , Ỹn) with
Ỹi =
Yi if δi = 1, i.e. when Yi ≤ cici if δi = 0, i.e. when Yi >
ciBased on this data the basic algorithm consist in treating the
observations initiallyas though they are not censored and apply the
least squares method to (Ỹ ,X) to
find initial estimates (σ̂0, β̂′0) of (σ,β
′).
17
-
Next, replace the censored values by their expected values,
i.e., replace Ỹi by
Ỹi,1 = E(Yi|Yi > ci ; σ,β′) whenever δi = 0 ,
computed by setting (σ,β′) = (σ̂0, β̂′0). Denote this modified
Ỹ vector by Ỹ 1. Again
treat this modified data set as though it is not censored and
apply the least squares
method to (Ỹ 1,X) to find new estimates (σ̂1, β̂′1) of (σ,β
′). Repeat the above stepof replacing censored Ỹi values by
estimated expected values
Ỹi,2 = E(Yi|Yi > ci ; σ,β′) whenever δi = 0 ,
this time using (σ,β′) = (σ̂1, β̂′1). This process can be
iterated until some stopping
criterion is satisfied. Either the iterated regression estimates
(σ̂j , β̂′j) do not change
much any more or the residual sum of squares has stabilized.
In order to carry out the above algorithm one needs to have a
computational expres-sion for
E(Y |Y > c ; σ,β′) ,where
Y = β1x1 + . . .+ βpxp + σe = µ(x) + σe
and the error term e has density g0(z). Then Y has density
g(y) =1
σg0
(y − µ(x)
σ
).
The conditional density of Y , given that Y > c, is
gc(y) =
g(y)/[1−G(c)] for y > c0 for y ≤ c .The formula for E(Y |Y
> c ; σ,β′) is derived for two special cases, namely for g0(z)
=ϕ(z), the standard normal density with distribution function Φ(z),
and for
g0(z) = δg̃0(δz − γ) = δ exp [δz − γ − exp(δz − γ)] ,where δ =
π/
√6 ≈ 1.28255 and γ ≈ .57721566 is Euler’s constant. Here g̃0(z)
=
exp [z − exp(z)] is the standard form of the Gumbel density with
mean −γ and stan-dard deviation δ. Thus g0(z) is the standardized
density with mean zero and variance
18
-
one. The distribution function of g0(z) is denoted by G0(z) =
G̃0(δz− γ) and is givenby
G0(z) = 1− exp(− exp[δz − γ]) .The Gumbel distribution is
covered for its intimate connection to the Weibull
distri-bution.
When g0(z) = ϕ(z) and utilizing ϕ′(z) = −zϕ(z) one finds
E(Y |Y > c ; σ,β′) =∫ ∞
cy gc(y) dy
=
[1− Φ
(c− µ(x)
σ
)]−1 ∫ ∞c
y1
σϕ
(y − µ(x)
σ
)dy
=
[1− Φ
(c− µ(x)
σ
)]−1 ∫ ∞[c−µ(x)]/σ
[µ(x) + σz]ϕ(z) dz
= µ(x)− σ[1− Φ
(c− µ(x)
σ
)]−1 ∫ ∞[c−µ(x)]/σ
ϕ′(z) dz
= µ(x) + σ
[1− Φ
(c− µ(x)
σ
)]−1ϕ
(c− µ(x)
σ
),
which is simple enough to evaluate for given σ and µ(x).
For g0(z) = δ exp [δz − γ − exp(δz − γ)] one obtains in similar
fashion
E(Y |Y > c ; σ,β′) =∫ ∞
cy gc(y) dy
=
[1− G0
(c− µ(x)
σ
)]−1 ∫ ∞c
y1
σg0
(y − µ(x)
σ
)dy
=
[1− G0
(c− µ(x)
σ
)]−1 ∫ ∞[c−µ(x)]/σ
[µ(x) + σz]g0(z) dz
= µ(x) + σ
[1− G0
(c− µ(x)
σ
)]−1 ∫ ∞[c−µ(x)]/σ
zg0(z) dz .
19
-
Here, substituting and integrating by parts, one has∫ ∞a
zg0(z) dz =∫ ∞
a[δz − γ + γ] exp [δz − γ − exp(δz − γ)] dz
= δ−1∫ ∞exp(δa−γ)
[log(t) + γ] exp(−t) dt
= δ−1(δa exp[− exp(δa− γ)] +
∫ ∞exp(δa−γ)
exp(−t) t−1 dt)
= a exp[− exp(δa− γ)] + δ−1E1[exp(δa− γ)] .Here E1(z) is the
exponential integral function, see Abramowitz and Stegun
(1972).There one also finds various approximation formulas for
E1(z) =∫ ∞
zexp(−t) t−1 dt ,
namely for 0 ≤ z ≤ 1 and coefficients ai given in Table 2 one
hasE1(z) = − log(z) + a0 + a1z + a2z2 + a3z3 + a4z4 + a5z5 +
�(z)
with |�(z)| < 2× 10−7, and for 1 ≤ z
-
Combining the above one obtains the following formula for E(Y |Y
> c; σ,β′):
E(Y |Y > c; σ,β′) = µ(x) + δ−1σ exp[exp
(c− µ(x)σ/δ
− γ)]
×(c− µ(x)σ/δ
exp
[− exp
(c− µ(x)σ/δ
− γ)]
+ E1
[exp
(c− µ(x)σ/δ
− γ)])
= c+ δ−1σ exp
[exp
(c− µ(x)σ/δ
− γ)]
E1
[exp
(c− µ(x)σ/δ
− γ)]
.
Note that for � = exp(δ[c− µ(x)]/σ − γ) ≈ 0 one has
E(Y |Y > c; σ,β′) = µ(x) + δ−1σ exp(�) [(γ + log(�)) exp(−�)
+ E1(�)]= µ(x) + δ−1σ exp(�)
×([γ + log(�)]
[1− �+O(�2)
]− log(�)− γ + a1�+O(�2)
)= µ(x) + δ−1σ exp(�) [(a1 − γ)�− � log(�)] +O(�2 log(�))
where a1 is as in Table 2. In particular, in the limiting case
as �→ 0, one has
E(Y |Y > c; σ,β′) −→ µ(x) .
This makes intuitive sense since in that case the censored
observation is so low as toprovide no information about the actual
failure time. In that case it reasonable toreplace a “completely
missing” observation by its mean value.
For λ = exp(δ[c− µ(x)]/σ − γ) very large one has
E(Y |Y > c; σ,β′) = c+ δ−1σ exp(λ)E1(λ) = c+ δ−1σ(1
λ+O(1/λ2)
)≈ c .
21
-
4.2 Some Specific Examples and Simulation Experiences
The data set Table 4 is taken from Gertsbakh (1989) for
illustrative and comparativepurposes. It gives the log-life times
for 40 tested motors under different temperatureand load
conditions. The failure indicator is one when the motor failed and
zero whenit was still running at the termination of the test. The
maximum likelihood estimatesfor the regression coefficients and
scale parameter were given by Gertsbakh as theentries in the first
row of Table 5. The corresponding estimates as computed by
ouralgorithm are given to the same number of digits in the second
row of that table. Theresults are reasonably close to each
other.
The data in Table 1 can be taken as another example, although
here there are nocovariates. This however provides an independent
way of gauging the accuracy ofour algorithm, since in that case we
have an independent double precision algorithmbased on root
solving. The answers by these two methods are given in Table 6 to
therelevant number of digits for comparison. The agreement is very
good (at least ninedigits) in this particular example.
As another check on the algorithm various simulations were
performed, either withnoncensored samples and or with various
degrees of censoring. In all cases only onecovariate was used. For
the noncensored case 1000 samples each were generated atsample
sizes n = 5, 20, 50, 100. The data were generated according to the
Gumbelmodel with a linear model β1 + β2ui, with β1 = 1 and β2 = 2.
The ui were randomlygenerated from a uniform distribution over (0,
1). The scale paramater was σ = .5.Figures 3 and 4 illustrate the
results. The dashed vertical line in the histogram for
σ̂ is located at σ√(n− 2)/n. It appears to be a better
indication of the mean of
the σ̂. Equivalently one should compare σ̂√n/(n− 2) against σ =
.5. The n − 2
“accounts” for the two degrees of freedom lost in estimating β1
and β2. Judging from
these limited simulation results it appears that the
factor√n/(n− 2) corrects for the
small sample bias reasonably well.
Figures 5-7 illustrate the statistical properties of the maximum
likelihood estimatesfor medium and heavily censored samples of size
n = 50, 500 and 1000. The censoringwas done as follows. For each
lifetime Yi in the sample a random censoring timeVi = .5 + 3γWi was
generated, with Wi taken from a uniform (0, 1) distribution.The
smaller of Yi and Vi was then taken as the i
th observation and the censoringindicator was set appropriately.
The parameter γ controls the censoring. A smallvalue of γ means
heavy censoring and larger γ means medium to light censoring.
In
22
-
this simulation γ = .2 and γ = 1 were used.
The presentations in Figures 5-7 plot for each sample the
estimate versus the corre-sponding censoring fraction. Originally N
= 1000 samples were generated under eachcensoring scenario, but
under n = 50 and heavy censoring two samples did not permita
solution, since at least 49 lifetimes were censored in those cases.
The percentagesgiven in these plots indicate the proportion of
estimates above the respective targetline. The percentages given in
parentheses use the dashed target line, which as inFigures 3-4 is
an attempt at bias correction. Note how the increasing sample size
en-tails a reduction in the scatter of the estimates. Also note how
the scatter increaseswith increasing censoring fraction.
Also shown in each plot of Figures 5-7 is the least squares
regression line to indicatetrends in the estimates against the
censoring fraction. It appears that for heavycensoring there is a
definite trend for the intercept estimates β̂1. Namely, as
thecensoring fraction increases so does the intercept estimate. We
do not know whetherthis effect has been discussed in the
literature. The usefulness of this relationship isquestionable,
since one usually does not know whether the regression line is
aboveor below the target line, since the latter is unknown. Note
that the median of theestimates β̂1 is close to target.
4.3 The Fortran Code GMLE
The file with the Fortran subroutine GMLE, developed out of the
above considera-tions, is called gmle.f and is documented in
Appendix A. Although the source codefor it could easily be made
available, it still requires linking with three BCSLIBsubroutine
libraries, namely optlib, bcsext, and bcslib. Once one has written
anappropriate driver for GMLE (which may be contained in the file
gmledrv.f, alsoavailable) one needs to compile these as follows on
a Sun workstation
f77 gmledrv.f gmle.f -loptlib -lbcsext -lbcslib .
23
-
Table 4: Motor Failure Data, Two Factors (from Gertsbakh, p.
206)
log rescaled rescaled log rescaled rescaledfailure load temper.
failure failure load temper. failuretime index indicator time index
indicator5.45 1 1 1 5.15 -1 1 15.74 1 1 1 6.11 -1 1 15.80 1 1 1
6.11 -1 1 16.37 1 1 1 6.23 -1 1 16.49 1 1 1 6.28 -1 1 16.91 1 1 1
6.32 -1 1 17.02 1 1 1 6.41 -1 1 17.10 1 1 0 6.56 -1 1 17.10 1 1 0
6.61 -1 1 17.10 1 1 0 6.90 -1 1 05.07 1 -1 1 3.53 -1 -1 15.19 1 -1
1 4.22 -1 -1 15.22 1 -1 1 4.73 -1 -1 15.58 1 -1 1 5.22 -1 -1 15.83
1 -1 1 5.46 -1 -1 16.09 1 -1 1 5.58 -1 -1 16.25 1 -1 1 5.61 -1 -1
16.30 1 -1 0 5.97 -1 -1 16.30 1 -1 0 6.02 -1 -1 16.30 1 -1 0 6.10
-1 -1 0
24
-
Table 5: Comparison of MLE’s for Data in Table 4
Source load temperatureintercept coefficient coefficient
scale
Gertsbakh 6.318 0.253 0.391 0.539our code 6.317 0.253 0.391
0.538
Table 6: Comparison of MLE’s for Data in Table 1
Source scale shapeparameter parameter
root solver 952.3774020 23.90139575optimization code 952.3774021
23.90139576
25
-
Figure 3: 1000 Simulations at n = 5 and n = 20 (uncensored)
-4 -2 0 2 4
020
4060
8010
014
0
beta_1
n = 5
-5 0 5 10
050
100
150
beta_2
0.0 0.2 0.4 0.6 0.8 1.0
010
2030
4050
60
sigma
0.0 0.5 1.0 1.5
020
4060
8010
0
beta_1
n = 20
0.5 1.0 1.5 2.0 2.5 3.0
010
2030
4050
60
beta_2
0.3 0.4 0.5 0.6 0.7
010
2030
4050
sigma
26
-
Figure 4: 1000 Simulations at n = 50 and n = 100
(uncensored)
0.6 0.8 1.0 1.2 1.4
020
4060
beta_1
n = 50
1.5 2.0 2.5 3.0
020
4060
80
beta_2
0.3 0.4 0.5 0.6
020
4060
80
sigma
0.8 1.0 1.2
010
2030
40
beta_1
n = 100
1.4 1.6 1.8 2.0 2.2 2.4 2.6
010
2030
4050
beta_2
0.40 0.45 0.50 0.55 0.60
010
2030
4050
sigma
27
-
Figure 5: 1000 Simulations at n = 50, medium and heavy
censoring
••
•
•
••
•
•
•
•
••
•
••
•• •
•
•
•
•
••
•••
•
• •
••••
•• ••
••
• ••
•
•
•
•• •
•
••
•
••
•
•
•
••
•
•• •
•
•
••
•
•
•
••
••
•
••
•
••
•
•
•
•
•
•
••
•
•
••
•
••
•
•
•
•• ••
•
•
•
•
•
•
•
•
•
•
•
•
•• •
•
•
••
• •
•
•
••
•
••
••
••
•
•
••
•
•••
•
••
•
•
•
•
•
•
•
•
•
• • •
•
•
•
•
•
•
•
•
•••
•
•
••
••
•
•
•
••
•
•
••
••
• •
•
•
•
•
••••
••
• ••
•••
•
••
• ••
••
•
••
•••
• ••
•• ••
•••
• ••
•
•
••
•
• •
•
• ••
••
•
•
• •
•
••
•• •
•
•
••
••
•
• •
•
••
••
••
•
•
•
• ••
•
•
•
•• •
••
••
•
••
•
••
••
•••
•
•
••
•
•
••
••
•
•
•
•••
•••
•
••
•
•
•
•
• ••
•
••
••
•
•
•• • •
•
•
•••
•
•
•
•• •
•
••
•
••
•
•
•
•
•
•
••
•••
••
••
•
••
•
•
•• •
• ••
••
•
•
••• •• •
•••
•
•
••
••
•
•
•••
•
• •••
•
•
•
•••
••
•
•••
•
•
•
•
•
••
•
•
•
•
••
•
•
•••
• • ••
••• •
••
•
••
•
•
•
••
•
••
•
•
•
•
•••
••
••
•••
•
••
••• •
•
•
• ••
•
••
•
•
• ••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•• • •
••
••
•
••
•
•••
•
•
•
• ••
•
• •
•
•••
•••
••
••
•
•
•
•
•
•
•
• ••
•
•
•
•
•
•
•
••
• •• •
•• •
•
••
•
•
•••
•
•
• ••
••
••
•
•
••
•
••
•• •
••
••
••
••
••
•
• •
•
• ••
•
•
•
• •
•
••
••
•
•••
••
•• •
•
•
•••
•
••
•
•••
•
• ••
•
• ••
•
•
•
••
•
••
•
••
•
•
••
•
•
•••
•
•
•
••
•
••
•
•
•
••
••
•
••
•
•
••
• ••
•
••
••
•
•• •
•
••
•
•
•
•
•
•
•
••
••
••
••
•• •
•
•
••
•
••
•
••
•
•••
•
•
••
•
••
••
••
••
• •
•
•
• ••
••
•
•
•
•• •
••
•••
•
••
•
•
•
•
••
•
•
••••
••
•
•
• •
•
• ••
•
•
•
• •
•
•
•••
•
•
•
•• •
•••
•••
•
•
•
•
•
••
•
••
•
•• ••
••
•
• • •
••
•
•
••
•
•
• •• ••
••
•••
• ••
••
•
••
••
•
•••
••
•••
•
•
••
••
••
•
•
•
••
••• •
••
•
•
•
•••
•
•••
•
•
•• •
•
••
•
•
•
•
•• ••
•••
• •
•
• •
•
•
• ••
•
••
••
•
•
•
••
•
•
•
•
••
•
•
••
•• • ••
••
•
•
•
•
•••
•
•••
•
•
•
• ••••
•
•
•••
•
•
•
•
•
•
•
censoring fraction
beta
_1
0.2 0.3 0.4 0.5 0.6
0.4
0.6
0.8
1.0
1.2
1.4
1.6
n = 50
48.9 %
••
•
•
•
••
•
•
•
••
•
••
••
••
•
••
•
••
••
•
•• •••• •
••
•
• •
••••
•
•
••
•
•
••
•
•
•
•
•
•
••
••
••
•
•
•
•
•
•
•
• ••
••
••
•
•• •
•
•
•
•
•• •
•
••
• •
•
••
•
•
••
•
•
•
•
••
••
•
•
••
••
••
••
•
••
• •••
••
•
•
••
•• ••
•
•• •
••••
•
••
•
•
•
•
••
•
•
• ••
•
•
••
•
•
•
•
•••
•
•
•
• •• ••
• ••
•
•• •
•
••
•
•
•
•
••
•
••
••
••
•••
••••
•
••
•
•
•
•• ••
• •
•• •• •
••••
••
•
•
•
•••
• •
•
•
••
• •
•
•
•
•
• •
• •
••
•
•
•
•
•••
••
•
••
••
• •
•
•
•
•
•
•
•
••
••
•
••
•
••
•
••
•••
••
• •
•
•
•
••
•
••
•
•
•
•
••
•••
••
••
•
•
••
•
••
•
•
• ••
•
•
•
••
•
•
•
•••
• •
••
•
• ••
••
•
••
•
•
•
•
•
•
•
•
•
••
••
••
•••
•
•
••
•••
•
•
•
•
•
•••
•
•
• ••
•
•
•
• •
• •
•
•
•••
•
•
•
••
•
•
•
•• •
•
•• •
• •
•
••
•
•
••
•
•
••
••
•
•
••
•• •
•
••
•• •
••
••
•
•••
••• •
•
•
•
•
•
••
•
•
•
•
••
••
•
•• ••
• •
• •
•
•
• •
•
• •
•• • •••
•
•
•
••
•••
•
••
•
•• •
•
•
••
•
••
•
•
•
• •• ••
•
•
••
••
•••
•
••
••••
•
••
••
•
•
•
••
•
•••
•
•
•
•
••
•
•
••
•
••
•• •
•
•
•••
•
••
••
•
•
•
•••
•• •
• •
•
•
••
••
•
••
••
• • ••
•
•
••
•
•
•
•
•
•
••• •
••
••
•
••
•
•
•
••
•
••
•
•
•
••
••
•
••
•••
•••
•
••
•
•
•
•
•
•
•
••
•
••
•
•
•
••••
•
••
•
•
••
• ••
••••
•••
•
•
••
••
•• ••
•
•
•
••
•
•••
•
••
•
•
•
•
•••
••
•••
••
•
•
••
•
•
• •
•
••
•
• •
•
••
•
•
•
••
••
••
••
•
• •
••
•
•
•
•
•
•••
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•••
•
•
••
•
•
• •
•
•
••
•
•••
•
•
•
•
••
•
•
••
•
•
•
••
•
•
•
•
••
•
•
•
•
•
•
••
•
•••
•
• ••
•
••
• • •
•• •
•
••
••
•
•• ••
•
••••
• •• •
•
•
•
•
• ••
••• •
•
•
••
•
•
••
••
•• •
•
•
•• •
•• •
••
•
•
•
•
•
•
•• •
•
•
••
•
•
•
•
•
•
•
•
•••
•
••
•
•
• •
••
•
•
•
•
••
•
••
••
•
•
••
•
••
•
•
•••
•
• ••
••
•
•
•
••
•
•
•••
••
•
••
•
•
•
•
••
••
•
•
•••
•
• •
•
•
•
•
censoring fraction
beta
_2
0.2 0.3 0.4 0.5 0.6
1.0
1.5
2.0
2.5
3.0
49.5 %
••
•
•
••
•
•
••
•
•
•
•
•
•• •
•
••
•
•
••
•
•
•
•
•
•
•
•
• ••
•
••
•
••
•
• •
••
••
•
•
••
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
• •
•
••
•
•
••
•
•
•
•
••
•
•
••
•• •
••
•
•
•
•
••
•• •••
••
••
•
•
••
•
•
•••
•
•
•
•
•
•
•••
•• •
•
••
•
•
••
••
•
• •
•
•
•
•
•
••
••
•
••
•
••
•
•
•
•
•
••
••
•
•
•• ••
••
•
•••
•
••
••
••
•
•
••
•
••
•
•
•••
••
• • ••
•
•
•
••
••
•
•
•
•
•
•
••
• •
•••
••
•
••
•
•
••••
•
•
•••
•••
•• ••
•
•
•
•
•
•
•
•
••
•
••
•••
•
•
•
•
•
•
•
•
•
•
• ••
•
•
•
••
•
• •
•
••
•
•• •
•
•
••
•
••
••
•
•
••
•
•
•
•
•••
•
••••••
•
••
••
••
•
•
••
•
•
•
••
•
•
•
••
•
•
••
• ••
•
••
•••
•
•
•
•
•
•
•
•
•
••
•
•
• • •
•
•
••
•
••
•
•
•
•
••
••
• •
•
•
•
•
•
• •
••
•
••
•
••
•
•
•
••
•
•
•
•
••
•
• •
••
••
•• •
• •
•
•
•• ••
•
••
••
•
• •
• •
•
•
•
•••
•
•
• ••
• ••• •
••
•
•
•
••
•
• •• •
••
•
•
•
•
•••
•
•
•
•••
•
•
••
•• •
••
••
•
• •••
•
•
•
• •
•• ••
•
•
•••
•
•
•
•
•••
•
•
•
•
••
••••
•
••
• •
•
•
•
•••
•
•
••
••
••
•
•
•
• ••
••
•
•
•
•
•
••
•
•
•
•••
••
•
•
• •
•
•••
• •
•
••
•
• ••
• •
•
••
•
•
•
•• •• • •
••
•
••
•
•
•
•• •
• • •
•
••
•
••
••
•
•
••
•
•
•• •
•••
•
••
•• ••
•
••
• •
•
•
•
•
••
••
••
•
••
•
•
•
••
••
•• •
•
•
•
•
•
•
••
• ••
••
• •
•••
•
•
• ••
••
•••
•
•• ••
•
••
•
••
••
•
•
• •
•
•
••
• •• •
•
•• ••
•
• •
•
•
•
••
••
•••
•
•
•
••
•
•
•••••
••
•
••
•
•••
•
•
• •
•
•
•
••
• •
•
•••
•
•
•
••
•
••
•
•
•••
• ••
•••
•
••
•
•••
••
•
•••
•
•
••
•
•
•
•
•
•• •
•
•
•
••
•
•
•
•
•
• ••
•
•
••
••
••
• •
•
•
•
•
••
•
•
•• •
••
•
••
•
•••
•
•
•
•
••
• ••
•
•
•
•
•
•
••• •••
•
•
•
•
•
•
•
••
•
••
•
•
••
•
•
• ••
•
••
• •
• •
••
•
•••
•
•
•
•
••
•
•
• •
•
•••
•
••
•
•
•
•
• ••
•
••
••
•
••
••
••
••
•
••
••
••
• ••
•
•
•
••• •
•
•
••
•
••• •
•
• •••
• ••
•
•
••
•
•
••
••
censoring fraction
sigm
a
0.2 0.3 0.4 0.5 0.6
0.3
0.4
0.5
0.6
0.7
40.4 %( 45 %)
• ••
•
••
•
•• • ••
•
• •
•
•• •
••
•
••
•
•••
•
•
•
•
•
••
••
•••
•
• •
•
••
•• •
••••
••
•
•
• ••
•
•
•
•••
•
•
•
••• •
••• •••
•• • •• •
•
••• ••
•
•
•
••
••
••
•
• •••
•
•••
•
•••••
•
••
•• ••• •• •
••
•••
••
•
•• •
••
••
•
•••
•
•
• •
••
•
••
•••
••• •
••
•••
• • •••
••
••
•
• ••••
•
•
•••
•
••
••
•
•••
•• •
•
•
•
••
• ••
•••
•• •
•
• ••
••
•• •
••
••
• ••
••••
• • •
•
•
••
••
••
•••
•
••
•
•• • •
• ••
••
••
•
•
•••
•••
••••
••
•
•
••
•
• •
••
• ••
•
••
••
•
••
•
••
•
• •• •••••
••
•
••• ••
•••
•••
•
•
•
••• ••
••• •
•
•
•••
••
•••
•••
•• ••••
••
••
•
•
••
•
••• •
•
•••
••• ••
•
•
•
••
••
•• •• • •
••• •
•••
•
•
• ••
••
•
••
•• • •
•
•• •• •
•
•
•
•••
•
•
•
••
•
•••• ••
•
•
••
•
••• ••
•
••
••• •
•
•
• • •••••
••
•
•
•
•• •
•
•
•• ••
•
• •••
••
••
•• •••
•• ••
••
• •
•
•
•
•••
••
•
•
•
•••• ••• •
•
••
•• •• •••
••• ••
••
••• ••
•••
••
••
• •
•
••••
••
•••
•••
•••
•
•
••
••
•
••
•
•
•
•
• • • •• ••
•
•
••
•
••
•• ••
• •
•• •
••• •• •
•
• •• •
••
•
•
•
•••
••
••
• ••• •
•
•••
•••
•• •
• •
•••
•
•
••
•• • ••
••
••
•
••
••
•
••
•
•••• • •
• ••
• ••
•
••• • ••
••
••
••
••
•
•
••
•••••
•••
•
•
•
• •
•
• ••
••
•
••
• ••
•••
•
•
• ••••
•
•• •••
•
•• •
••
•
•
• ••••
•
• •••
•• •
•
••• •• •••
•
•
•
• •
••
•••
• •
•• ••
••
••
•
••
•• •
•
••
••
••
••
•
••
•
••• •
•
•• • •••
••• ••
••
•
•••
•
• •• •
•• •
• ••
• ••
••
••
••
•
••
•• •
••
•• ••
• ••
•• • •
•••
••
•• •••
••• •
•
•
•
••• ••
••
••
••
••
•
•
••
•
••
•
•••
• ••
•
•
••
• ••• •
•
•
•••
•
•
•
•
•
••
•
••
••
•
•••
•
••
•
•• •••
•
• ••
••
••••
•••
• • •••
•
•
•
•• •••
• •••
•••
••
••
•
•••
•
•
• •
•••
••
•
censoring fraction
beta
_1
0.65 0.70 0.75 0.80 0.85 0.90 0.95
01
23
42.1 %
n = 50
• •••
•• ••
• • •••
• •••• ••• •
•• •• ••• •• •• ••• •
••
••
••
•• •• •
•••• •
••
••
•
•
•••• •
•• ••
•
••
• • ••• ••• ••
•••
•
••
•• •• • •
••• •• ••••••
• •• •• ••
•••••
•• •• •
••
•• •
••• ••• ••
•••
••
•••
•••
•
•• • ••••
••
• •••• ••
• •• •• ••••• •••
•
• ••
•••
•••
•••• ••
••• •• • •
•• •
• ••
•• •
•• •••
•
••
• • •• •••
•••• • •• •••• • •
•••••• • ••
•
• ••
• ••
• • •• •• •••• •
••
••
• •••••
• ••
•• •••
•••
• ••
• •••• • •••
•••
••
•• ••• ••• •
•
•••••
• •• ••••
•• • •• • ••
•
• ••
• •••• •
••• •• • •• •••• •• •• •
•
•••••• •••• ••
• • ••••
•• •• ••
• •• • • ••• • ••
•• •• ••
•
•
• ••
• •
•
••• •• • •••
•• ••
••
••• • ••••••
••• •
••••• •
• ••••
•• •• • •••
•• ••
•
••••
••
••
• •••
• •••
• •• •••
••• •••
•
•• • ••
•
•••• ••• ••
••••
•• • ••••
•• ••
••• •
•• •••••
•• •• • •••
•• •• • •• ••
• • ••••
••• •••
•
•• •••
•• •••
••
• • • •• •• ••
••• • •
•• •••
••• •••••• •
•• •• •
•• •• •••• •••
•• • •• • ••
•• •• • ••
•• ••• •• • •
••• •
••• •
•• •••
••
•
• •• ••• •• •• •••
••••• • •
•
• •••• ••••
•
•• •••
••
••
••• •
••
• ••
•• •• •• ••••
•
••••• •
• ••• •••
•••
• ••• •
• •••••
•••
••
••• • •• •••• •••
•••• • •••
•• •••• ••
• ••
•
•
••
•• •••••
• •••• •••
••
•••
• •• •••• •••
•• •••
•
•• ••
• • •• ••
•••
• ••• ••
••
••
•••• ••
• •• ••
• •••• •
•
••• ••
•••
••••• • ••
•••
• ••
•
•• •••
•••••
•••
•
•••• • •••
•• •••• • ••••
•• •• •• •
• ••
•••• •••
••
••
•• •
•••
• •• •
•
•• •••••
••
• • •••
•
•
••
• •• • ••
• •••• • •• ••
•••
•• •
•••••
••
censoring fraction
beta
_2
0.65 0.70 0.75 0.80 0.85 0.90 0.95
05
1015
2025
30
49.1 %
• ••
•••
••
•
• •
•
•
•
•
•
•
•
•••
•
••
•
••
•
•
•
••
•
••
•
•
•
•
•
••
•
••
•
•• ••
•
• •
•
•••
•
•
•
•
•
• •
•
•
•
•
•
••
•
•
••
•
•
•
••
••
•
•
•
•
•
•
•
•
•• •
• ••
••
•
•
•
•
•
•
••
• ••
•• •
••
•••
•
•
••
•
••
• •
•
••
•
••
••••
•
•
•• •
••
•• •
•
•• •
•
•
•
••
•
•
• •
• ••
•
••
••
••
••
••
••
•
•
•••
••
•
••
•
••• ••
•
•
••• •
••
•
•
••
•
•
• •
•
•
•
•
•
•
•
•
•
•
•• •••
•
••• ••
• ••••
••
••••
••
•
••
•
•
••• •
•
• •
•
• •
•
•••
••
• •
•
•
•
•••
••
••
• ••
•
•
•
•
•
•••
••
•
••
•
•
••
•
••
•
••
•
••
•• ••
••
•
•
•
•
•
••
•
••
••
•••
•
••
•
•
••
•
•
•
•
•
••
•
••
•
•
••
•
••
•
••
•
••
•
•• •
•
••
•••
•
••
••
•• •
•
•
•
•
••
•
••
•• •
• •
••
•• •
• •
••
•
• •
••••
• •
•
••
•
•
•• •• •
••
•
••
••
•••
••
•
••
••
•
•
•••
•
••••
• •
• ••
•
• •••
• •
••
•
•••
•
•
•
•
•
••
•••
• ••
•
•
••
•••• •
•• •
•• •
•
•
•
•• •
•
•••
•••
•
••
••
•
•
••
•• •
•• •
••••
•
•
••
•
•
• •••••
•
••••
•
•
•
•• ••
•
••
•••
• ••
•
• ••
•
••• ••
•
•
•
•
••
•
•
•
•• •
•
•
• •••
•
•••
••
••
•
•
•
••
• ••••
••
•
••
••
•
••
•
•
•
•••
•
•
•
•
••
•
•• •
•
••••
••
•••
••
•
•
•
•
•
•
••
••
••
•••
•••
••
••••
•
••
••
•• ••
•••
•••
••
•
•••
•
• ••
••
•••
•
••
•
•
•
•
•••
••
• •
•
•
•
• ••
•
••
•
•
• • •
• •
•
•
•
•••
•
•
••
•••
•
•
•
•
•
••
•
•
•
•
•
•
••
•
••• •
•
•••
•
•
••
•
• •
••
•
••
•
••
•••
•
•
•
••
•
•
•
• •
•
••
•
••
•
•
•
••••• •
•
•
••
••
••
• •
••
••
••
•
•
••
•
•• •
•• •
•• •
•
•
••
• ••• •
• •
•
••
••
•• •
•
••• •
••
•
• ••
•
•
••
• •
•
••
•
•• •
••
•••• •
•
••
•
•
••
••
•
••
• ••
••
•
•
••
•••
•
• • ••
•
• ••
•
••
•
•
•
•
•
•
•
• •••
••
•••
•
•
••
• •
••
••
•
••
•
•
•
••
•
• •
•
•
•••
•
• • •••
•
• •
••
•
•
•
•
•
••
••
•
•
•
•
•
•
•••
•
• ••
•••
•
•
•
censoring fraction
sigm
a
0.65 0.70 0.75 0.80 0.85 0.90 0.95
0.2
0.6
1.0
1.4
42.6 %( 44.9 %)
28
-
Figure 6: 1000 Simulations at n = 500, medium and heavy
censoring
•
•
•
••
••
•
• •
•
•
••
•
•
•
•
•
•
••
•
••
•
•• •
••
•
•
•
• ••
•
•
•
•• • •
••
• •
•
••
••
••••
••
•
•
• • ••
• •••
•
•
••
• •
•
•• •
••
•
•
•
•
•• •
•
•
•
•
•
•
•
• •
••
•
•
•
•
•
••
••
•••
••
••• •
•
•
•
•
•
•••
•
••
•
• ••
•
•
•
•
•
• •
• •
•••• •••
••
•
•
•
•
••
•
••
•
• ••
•
•
•
•
•
•
•
••
•
•
•
• •
•••
•
•• •
•
•
•
••
•
•
•
• •••
•
•••
•
•••
•
•
•
••
•• •
•
•
•
•• •• •
•• ••
••
•
•
•
••
••
••
•
•
•
•
•
•
••
•
•
••
••
•
•
•
•
• ••
•
••• • ••
•
•
•
•
•
•
•
••
••
•
••• •
••••
••
•
•••
•• •• • ••
••
••
••
•
•
• ••
•
•
••
•
•
•
•
•
•
•
•
•• •
• •
•
•
••
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
• •
•
•
•
•• •
•••
•
•
•
•
•
•
•
• ••
••
•
•
••
•
•
••
•
• •
•
•
•••
•
•
•
•
••
•••
•
••
•
•
•
•••
• •
•
••
••
••
•••
•
•
•••
•
•
•
•
• •
••
••
•
•
•
•
• •
•
•
••
•
•
•
•
•
••
•
•
•
•
••
•
•
••
•
•••
• •
•
••
•
••
••
•• •
•
•
••
•
•• •
•••
•
•
••
••
•
•••• •
•
•••
••
••
•
• •
••• •
••
•
• •••
•
•
•
•
•
••
••
•
•
•
•
••
••
•
•
•
•
•••
•
• •
•
•••
••
•
••
••
•
•
•
•
•
•
•
••
•
•
••
••
•
• ••
•
•
•
•
••
•
•
••
••
••
•
•
••
•
•
••
•
•
•
•••
•
•
••
•••
••
••
• •
•
• ••
•
•
•
•
••
••
•
••
•
•
•
•
•
•••
••
•
•
••
•
•
••
•
•
••
•
•• •
•
•
••
•
•
•
•
••
•
••
•
• •
•
• ••
•••
•
••
••
••
•
•
•
••
•
•
•
• ••
••
•
••
•• •
•
••
•• •
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
••
•
•
••
•
••
•
••
•
•
••
•
•
••
•
••
••
•••
•
•
•
•
•
•
•
• ••
• •••
•••
•
•
•
••
•
•
••
••
••
•
•
•••
••
•
•
•
••
• •••
•
•
•••• ••
• ••
•
•
•
•
•
•
•
•
•
•
•
•
•••
•
•
•
••
• •
• •
•• ••
••
•
•
•
•
••
•
•
• •
• •
••
•
•
•••
••
•
•
•
• •
•
•
•
•• •
•
••
•
•
•••
•
••
•• ••
•