Thesis Defense

EMPIRICAL LIKELIHOOD INFERENCE FOR THE

ACCELERATED FAILURE TIME MODEL USING KENDALL ESTIMATING EQUATUION

By Yinghua Lu

June 29th 2009

Georgia State University

Contents

Introduction

Main Procedure

Simulation Study

Real Application

Conclusion

Introduction – AFT Model

Accelerated Failure Time (AFT) Model: Very popular. Similar to the classic linear regression:

where Y=ln(T).

Different methods are developed OLS Non-monotone estimating equations Monotone estimating equations with normal approximation.

Introduction – Kendall’s Tau

Let {X1,Y1} and {X2, Y2} be two observations of two variables.

Kendall’s tau coefficient is defined as:

where nc is the number of [sign(X1-X2) = sign(Y1-Y2)], nd is the number of [sign(X1-X2) = -sign(Y1-Y2)].

Sen (1968) proposed

ε(b)=Y-bX

U(b) is non-increasing in b.

Introduction – Empirical Likelihood

A nonparametric method

Based on a data-driven likelihood ratio function

Without specifying a parametric family of distributions for the data.

The shape of confidence regions

Joins the reliability of the nonparametric methods and the efficiency of the likelihood methods.


For X1,X2,…,Xn, the likelihood function is defined by

Let X1,X2,…,Xn be n independent samples, the empirical cumulative distribution (ECDF) at x is

The nonparametric likelihood of the CDF can be defined as


Likelihood ratio:

Owen (2001) proved

Introduction – Brief History

Traced back to Thomas and Grunkemeier (1975)

Summarized and discussed in Owen (1988, 1990, 1991, 2001)

Qin and Jing (2001) and Li and Wang (2003): the limiting distribution EL ratio is a weighted chi-square distribution.

Zhou (2005) and Zhou and Li (2008): Logrank and Gehan estimators, and Buckley-James estimator.

Main Procedure – Preliminaries

Let T1,…,Tn be a sequence of random variables and Ti > 0. Let Z1,…,Zn be their corresponding covariates sequence.

Z and β are px1 vectors.

We observe and

Define

We employee the estimating equation as follow:


We can rewrite it as a U-statistic with symmetric kernel,

Similar to Fygenson and Ritov (1994),

where R and J are defined similarly in Fygenson and Ritov (1994).


The asymptotic variance of generalized estimate of β is

The numerator can be estimated by

The denominator can be estimated by

Then we can construct the confidence interval as

Main Procedure – Empirical Likelihood

Let and

Apply the idea of Sen (1960), we define

where W’s are independently distributed.


Let be a probability vector. Then the empirical likelihood function at the value β is given by

For this function, reaches its maximum when

Thus, the empirical likelihood ratio at β is defined by


By Lagrange Multiplier method for logarithm transformation of above equation, we write

Setting the partial derivative of G with respect to p to 0, we have

then


Plug into the previous equation, we obtain

So, for all the p’s

We have


Theorem 1 Under the above conditions, converges in distribution to , where is a chi-square random variable with p degrees of freedom.

Confidence region for β is given by

EL confidence region for the q sub-vector

Of

Theorem 2 Under the above conditions, converges in distribution to , where is a chi-square random variable with q degrees of freedom.

confidence region for is given by

Simulation Study – EL vs. NA

Consider the AFT model:

Model 1: (skewed error distribution)

Z ~ Uniform distribution in [-1, 1].

The censoring time C ~ Uniform distribution in [0, c], where c controls the censoring rate.

The error term has the standard extreme value distribution, which is skewed to the right.


Model 2: (symmetric error distribution ).

Z ~ Uniform distribution in [0.5, 1.5].

The censoring time C is defined as 2exp(1)+c.

The error term has the standard Normal distribution N(0,1), which is symmetric.

Setting:

Repetition: 10000

Censoring Rate 15% 30% 45% 60%

Sample Size 30 50 75 100


Results for model 1:

1-α=0.90 1-α=0.95

CR n Wald EL Wald EL

15%

30CP* 0.8686 0.8986 0.9221 0.9427

AL** 1.4354 1.5751 1.7102 1.9030

50CP 0.8856 0.9084 0.9338 0.9516

AL 1.0857 1.1840 1.2936 1.4138

75CP 0.8880 0.9123 0.9405 0.9589

AL 0.8730 0.9500 1.0402 1.1412

100CP 0.8937 0.9152 0.9425 0.9607

AL 0.7520 0.8065 0.8960 0.9793



1-α=0.90 1-α=0.95


30%

30CP* 0.8669 0.8946 0.9163 0.9366

AL** 1.6870 1.8175 2.0101 2.1739

50CP 0.8768 0.8984 0.9272 0.9452

AL 1.2694 1.3635 1.5124 1.6253

75CP 0.8828 0.9057 0.9372 0.9515

AL 1.0218 1.1044 1.2174 1.3113

100CP 0.8911 0.9108 0.9418 0.9594

AL 0.8810 0.9479 1.0497 1.1352



1-α=0.90 1-α=0.95CR n Wald EL Wald EL

45%

30CP*

0.8494 0.8720 0.9081 0.9188AL**

2.0324 2.1770 2.4216 2.5976

50CP

0.8699 0.8846 0.9233 0.9333AL

1.5241 1.5961 1.8160 1.9005

75CP

0.8798 0.8999 0.9336 0.9437AL

1.2293 1.2953 1.4647 1.5278

100CP

0.8879 0.9041 0.9394 0.9470AL

1.0555 1.1255 1.2576 1.3275



1-α=0.90 1-α=0.95


60%

30CP*

0.8136 0.8382 0.8760 0.8865AL**

2.6101 2.7787 3.1099 3.2616

50CP

0.8482 0.8492 0.9008 0.9028AL

1.9459 1.9870 2.3186 2.3588

75CP

0.8700 0.8669 0.9213 0.9162AL

1.5701 1.5890 1.8708 1.8744

100CP

0.8738 0.8807 0.9284 0.9275AL

1.3462 1.3824 1.6040 1.6210



1-α=0.90 1-α=0.95


15%

30CP*

0.8539 0.9082 0.9120 0.9504AL**

2.3067 2.4421 2.7485 2.8872

50CP

0.8753 0.9162 0.9293 0.9612AL

1.7432 1.8874 2.0770 2.2510

75CP

0.8850 0.9181 0.9374 0.9627AL

1.4096 1.5134 1.6795 1.8260

100CP

0.8880 0.9122 0.9409 0.9626AL

1.2158 1.2851 1.4486 1.5596



1-α=0.90 1-α=0.95


30%

30CP*

0.8520 0.9002 0.9065 0.9458AL**

2.4348 2.5429 2.9010 2.9912

50CP

0.8760 0.9063 0.9238 0.9514AL

1.8430 1.9749 2.1960 2.3563

75CP

0.8829 0.9125 0.9377 0.9606AL

1.4885 1.5895 1.7735 1.9085

100CP

0.8832 0.9091 0.9380 0.9587AL

1.2805 1.3528 1.5257 1.6343



1-α=0.90 1-α=0.95


45%

30CP*

0.8434 0.8828 0.8985 0.9328AL**

2.6690 2.8711 3.1801 3.4428

50CP

0.8698 0.8959 0.9207 0.9415AL

2.0308 2.1348 2.4197 2.5526

75CP

0.8804 0.9074 0.9326 0.9499AL

1.6297 1.7280 1.9418 2.0526

100CP

0.8875 0.9077 0.9367 0.9543AL

1.4077 1.4885 1.6773 1.7755



1-α=0.90 1-α=0.95


60%

30CP*

0.8319 0.8634 0.8869 0.9093AL**

3.0160 3.1968 3.5935 3.7967

50CP

0.8705 0.8783 0.9179 0.9222AL

2.2770 2.3433 2.7130 2.7909

75CP

0.8818 0.8944 0.9300 0.9397AL

1.8232 1.8974 2.1723 2.2437

100CP

0.8808 0.8995 0.9330 0.9422AL

1.5662 1.6452 1.8661 1.9422


Summary:

As the sample size increase, the coverage probabilities (CP) for both methods increase.

As the censoring rate increase, the coverage probabilities (CP) for both methods decrease.

When the sample size is small, the CP for EL is better than NA, for very heavy censoring rate, both are not good enough though.


Summary:

Average length for the EL is a little longer than the NA in all cases.

A little over-coverage problem with the EL.

Under-coverage problem with the NA.

Simulation Study – Kendall vs. others

Consider the following AFT model:

We observe and

Model 3: Z ~ Normal distribution as N(1, 0.52). The censoring time C ~ Normal distribution as N(µ, 42),

where µ produce samples with censoring rate equal to 10%, 30%, 50%, 75%.

The error term has Normal distribution as N(0, 0.52). Sample Size: 50, 100 and 200 Repetition: 5000

Simulation Study – Kendall


Confidence Level = 90% Confidence Level = 95%

CR n B-J Logrank Gehan Kendall B-J Logrank Gehan Kendall

10%50 0.8924 0.8879 0.8832 0.9110 0.9406 0.9399 0.9356 0.9516

100 0.8888 0.8909 0.8904 0.9212 0.9404 0.9479 0.9446 0.9630

200 0.8810 0.9059 0.8938 0.9012 0.9458 0.9500 0.9446 0.9506

30%50 0.8866 0.8869 0.8804 0.9078 0.9374 0.9359 0.9290 0.9522

100 0.8936 0.8889 0.8870 0.9212 0.9472 0.9410 0.9382 0.9596

200 0.8922 0.9139 0.8958 0.9108 0.9468 0.9619 0.9440 0.9592

50%50 0.8838 0.8798 0.8650 0.8978 0.9324 0.9319 0.9226 0.9370

100 0.8926 0.8939 0.8820 0.9090 0.9414 0.9519 0.9370 0.9538

200 0.8952 0.8929 0.8968 0.9142 0.9482 0.9469 0.9424 0.9604

75%50 0.8420 0.8350 0.8030 0.8556 0.9042 0.8910 0.8628 0.8866

100 0.8818 0.8740 0.8536 0.8856 0.9344 0.9300 0.9118 0.9340

200 0.8928 0.8860 0.8788 0.9012 0.9438 0.9440 0.9358 0.9490

Simulation Study – Kendall


When the sample size is small (n=50) and the censoring rate is heavy, Kendall’s rank regression estimator is better an all the other estimators.

In other cases, Kendall’s rank regression estimator is also comparative.

Real Application

1. Bone marrow transplants are a standard treatment for acute leukemia.

2. Total of 137 patients were treated.

3. For simplicity, the model contains only one covariate at a time, which is where Ti is Time to Death.

4. The response variable Time to Death takes values from 1 day to 2640 days with mean equal to 839.16 days.

Real Application

We consider the following four variables:

1. Disease Group (3 groups)

2. Waiting Time to Transplant in Days (from 24 to 2616 days, mean=275 days)

3. Recipient and Donor Age (from 7 to 52 and from 2 to 56)

4. French-American-British (FAB): classification based on standard morphological criteria.

Real Application

FAB Group Age TimeToTrxβ_hat -0.8388 -0.4558 -0.4588 -0.2055

CI CI CI CI

Wald

1-α=0.90(-1.3485 -0.3290)

(-0.8415 -0.0700)

(-0.7770 -0.1406)

(-0.4379 0.0268)

Length 1.0195 0.7715 0.6364 0.4647

1-α=0.95(-1.4461 -0.2314)

(-0.9154 0.0039)

(-0.8379 -0.0797)

(-0.4823 0.0713)

Length 1.2147 0.9193 0.7582 0.5536

1-α=0.99(-1.6370 -0.0406)

(-1.0598 0.1483)

(-0.9571 0.0394)

(-0.5694 0.1583)

Length 1.5964 1.2081 0.9965 0.7277

EL

1-α=0.90(-1.3725 -0.2541)

(-0.8626 -0.0382)

(-0.9318 -0.1718)

(-0.4904 -0.0134)

Length 1.1184 0.8244 0.7600 0.477

1-α=0.95(-1.4933 -0.1240)

(-0.9442 0.0390)

(-1.0503 -0.1087)

(-0.5728 0.0211)

Length 1.3693 0.9832 0.9416 0.5939

1-α=0.99(-1.6189 0.1328)

(-1.1249 0.2233)

(-1.2549 0.0054)

(-0.7332 0.0774)

Length 1.7517 1.3482 1.2603 0.8106

Real Application

Results:1. Two methods show similar results.

2. Two exceptions may due to asymmetric CI of the EL.

3. Average lengths of the EL are a little longer than that of the NA. Same results with the simulation study.

Conclusion & Discussion

Average length of the CI by the EL are slightly longer than that by NA.

The coverage probabilities of the EL are closer to the nominal levels than NA, especially when the sample size is very small and censoring rate is heavy.

Kendall’s rank regression estimator is better than the Buckley-James, Logrank and Gehan estimators in terms of coverage probabilities.

Conclusion & Discussion

The combination of the Kendall estimating equation and the EL CI has strong advantages over the other considered approaches in the case of small sample size and heavy censoring rate.

The combination shows a problem of over-coverage.

A smoothing kernel is suggested to eliminate such a problem in the future work.

Thank you !

Thesis Defense

Education