Top Banner
EMPIRICAL LIKELIHOOD INFERENCE FOR THE ACCELERATED FAILURE TIME MODEL USING KENDALL ESTIMATING EQUATUION By Yinghua Lu June 29 th 2009 Georgia State University
38

Thesis Defense

Jan 23, 2015

Download

Education

Aaron Lu

EMPIRICAL LIKELIHOOD INFERENCE FOR THE ACCELERATED FAILURE TIME MODEL USING KENDALL ESTIMATING EQUATUION
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Thesis Defense

EMPIRICAL LIKELIHOOD INFERENCE FOR THE

ACCELERATED FAILURE TIME MODEL USING KENDALL ESTIMATING EQUATUION

By Yinghua Lu

June 29th 2009

Georgia State University

Page 2: Thesis Defense

Contents

Introduction

Main Procedure

Simulation Study

Real Application

Conclusion

Page 3: Thesis Defense

Introduction – AFT Model

Accelerated Failure Time (AFT) Model: Very popular. Similar to the classic linear regression:

where Y=ln(T).

Different methods are developed OLS Non-monotone estimating equations Monotone estimating equations with normal approximation.

Page 4: Thesis Defense

Introduction – Kendall’s Tau

Let {X1,Y1} and {X2, Y2} be two observations of two variables.

Kendall’s tau coefficient is defined as:

where nc is the number of [sign(X1-X2) = sign(Y1-Y2)], nd is the number of [sign(X1-X2) = -sign(Y1-Y2)].

Sen (1968) proposed

ε(b)=Y-bX

U(b) is non-increasing in b.

Page 5: Thesis Defense

Introduction – Empirical Likelihood

A nonparametric method

Based on a data-driven likelihood ratio function

Without specifying a parametric family of distributions for the data.

The shape of confidence regions

Joins the reliability of the nonparametric methods and the efficiency of the likelihood methods.

Page 6: Thesis Defense

Introduction – Empirical Likelihood

For X1,X2,…,Xn, the likelihood function is defined by

Let X1,X2,…,Xn be n independent samples, the empirical cumulative distribution (ECDF) at x is

The nonparametric likelihood of the CDF can be defined as

Page 7: Thesis Defense

Introduction – Empirical Likelihood

Likelihood ratio:

Owen (2001) proved

Page 8: Thesis Defense

Introduction – Brief History

Traced back to Thomas and Grunkemeier (1975)

Summarized and discussed in Owen (1988, 1990, 1991, 2001)

Qin and Jing (2001) and Li and Wang (2003): the limiting distribution EL ratio is a weighted chi-square distribution.

Zhou (2005) and Zhou and Li (2008): Logrank and Gehan estimators, and Buckley-James estimator.

Page 9: Thesis Defense

Main Procedure – Preliminaries

Let T1,…,Tn be a sequence of random variables and Ti > 0. Let Z1,…,Zn be their corresponding covariates sequence.

Z and β are px1 vectors.

We observe and

Define

We employee the estimating equation as follow:

Page 10: Thesis Defense

Main Procedure – Preliminaries

We can rewrite it as a U-statistic with symmetric kernel,

Similar to Fygenson and Ritov (1994),

where R and J are defined similarly in Fygenson and Ritov (1994).

Page 11: Thesis Defense

Main Procedure – Preliminaries

The asymptotic variance of generalized estimate of β is

The numerator can be estimated by

The denominator can be estimated by

Then we can construct the confidence interval as

Page 12: Thesis Defense

Main Procedure – Empirical Likelihood

Let and

Apply the idea of Sen (1960), we define

where W’s are independently distributed.

Page 13: Thesis Defense

Main Procedure – Empirical Likelihood

Let be a probability vector. Then the empirical likelihood function at the value β is given by

For this function, reaches its maximum when

Thus, the empirical likelihood ratio at β is defined by

Page 14: Thesis Defense

Main Procedure – Empirical Likelihood

By Lagrange Multiplier method for logarithm transformation of above equation, we write

Setting the partial derivative of G with respect to p to 0, we have

then

Page 15: Thesis Defense

Main Procedure – Empirical Likelihood

Plug into the previous equation, we obtain

So, for all the p’s

We have

Page 16: Thesis Defense

Main Procedure – Empirical Likelihood

Theorem 1 Under the above conditions, converges in distribution to , where is a chi-square random variable with p degrees of freedom.

Confidence region for β is given by

EL confidence region for the q sub-vector

Of

Theorem 2 Under the above conditions, converges in distribution to , where is a chi-square random variable with q degrees of freedom.

confidence region for is given by

Page 17: Thesis Defense

Simulation Study – EL vs. NA

Consider the AFT model:

Model 1: (skewed error distribution)

Z ~ Uniform distribution in [-1, 1].

The censoring time C ~ Uniform distribution in [0, c], where c controls the censoring rate.

The error term has the standard extreme value distribution, which is skewed to the right.

Page 18: Thesis Defense

Simulation Study – EL vs. NA

Model 2: (symmetric error distribution ).

Z ~ Uniform distribution in [0.5, 1.5].

The censoring time C is defined as 2exp(1)+c.

The error term has the standard Normal distribution N(0,1), which is symmetric.

Setting:

Repetition: 10000

Censoring Rate 15% 30% 45% 60%

Sample Size 30 50 75 100

Page 19: Thesis Defense

Simulation Study – EL vs. NA

Results for model 1:

      1-α=0.90 1-α=0.95

CR n   Wald EL Wald EL

15%

30CP* 0.8686 0.8986 0.9221 0.9427

AL** 1.4354 1.5751 1.7102 1.9030

50CP 0.8856 0.9084 0.9338 0.9516

AL 1.0857 1.1840 1.2936 1.4138

75CP 0.8880 0.9123 0.9405 0.9589

AL 0.8730 0.9500 1.0402 1.1412

100CP 0.8937 0.9152 0.9425 0.9607

AL 0.7520 0.8065 0.8960 0.9793

Page 20: Thesis Defense

Simulation Study – EL vs. NA

Results for model 1:

      1-α=0.90 1-α=0.95

CR n   Wald EL Wald EL

30%

30CP* 0.8669 0.8946 0.9163 0.9366

AL** 1.6870 1.8175 2.0101 2.1739

50CP 0.8768 0.8984 0.9272 0.9452

AL 1.2694 1.3635 1.5124 1.6253

75CP 0.8828 0.9057 0.9372 0.9515

AL 1.0218 1.1044 1.2174 1.3113

100CP 0.8911 0.9108 0.9418 0.9594

AL 0.8810 0.9479 1.0497 1.1352

Page 21: Thesis Defense

Simulation Study – EL vs. NA

Results for model 1:

      1-α=0.90 1-α=0.95CR n   Wald EL Wald EL

45%

30CP*

0.8494 0.8720 0.9081 0.9188AL**

2.0324 2.1770 2.4216 2.5976

50CP

0.8699 0.8846 0.9233 0.9333AL

1.5241 1.5961 1.8160 1.9005

75CP

0.8798 0.8999 0.9336 0.9437AL

1.2293 1.2953 1.4647 1.5278

100CP

0.8879 0.9041 0.9394 0.9470AL

1.0555 1.1255 1.2576 1.3275

Page 22: Thesis Defense

Simulation Study – EL vs. NA

Results for model 1:

      1-α=0.90 1-α=0.95

CR n   Wald EL Wald EL

60%

30CP*

0.8136 0.8382 0.8760 0.8865AL**

2.6101 2.7787 3.1099 3.2616

50CP

0.8482 0.8492 0.9008 0.9028AL

1.9459 1.9870 2.3186 2.3588

75CP

0.8700 0.8669 0.9213 0.9162AL

1.5701 1.5890 1.8708 1.8744

100CP

0.8738 0.8807 0.9284 0.9275AL

1.3462 1.3824 1.6040 1.6210

Page 23: Thesis Defense

Simulation Study – EL vs. NA

Results for model 2:

      1-α=0.90 1-α=0.95

CR n   Wald EL Wald EL

15%

30CP*

0.8539 0.9082 0.9120 0.9504AL**

2.3067 2.4421 2.7485 2.8872

50CP

0.8753 0.9162 0.9293 0.9612AL

1.7432 1.8874 2.0770 2.2510

75CP

0.8850 0.9181 0.9374 0.9627AL

1.4096 1.5134 1.6795 1.8260

100CP

0.8880 0.9122 0.9409 0.9626AL

1.2158 1.2851 1.4486 1.5596

Page 24: Thesis Defense

Simulation Study – EL vs. NA

Results for model 2:

      1-α=0.90 1-α=0.95

CR n   Wald EL Wald EL

30%

30CP*

0.8520 0.9002 0.9065 0.9458AL**

2.4348 2.5429 2.9010 2.9912

50CP

0.8760 0.9063 0.9238 0.9514AL

1.8430 1.9749 2.1960 2.3563

75CP

0.8829 0.9125 0.9377 0.9606AL

1.4885 1.5895 1.7735 1.9085

100CP

0.8832 0.9091 0.9380 0.9587AL

1.2805 1.3528 1.5257 1.6343

Page 25: Thesis Defense

Simulation Study – EL vs. NA

Results for model 2:

      1-α=0.90 1-α=0.95

CR n   Wald EL Wald EL

45%

30CP*

0.8434 0.8828 0.8985 0.9328AL**

2.6690 2.8711 3.1801 3.4428

50CP

0.8698 0.8959 0.9207 0.9415AL

2.0308 2.1348 2.4197 2.5526

75CP

0.8804 0.9074 0.9326 0.9499AL

1.6297 1.7280 1.9418 2.0526

100CP

0.8875 0.9077 0.9367 0.9543AL

1.4077 1.4885 1.6773 1.7755

Page 26: Thesis Defense

Simulation Study – EL vs. NA

Results for model 2:

      1-α=0.90 1-α=0.95

CR n   Wald EL Wald EL

60%

30CP*

0.8319 0.8634 0.8869 0.9093AL**

3.0160 3.1968 3.5935 3.7967

50CP

0.8705 0.8783 0.9179 0.9222AL

2.2770 2.3433 2.7130 2.7909

75CP

0.8818 0.8944 0.9300 0.9397AL

1.8232 1.8974 2.1723 2.2437

100CP

0.8808 0.8995 0.9330 0.9422AL

1.5662 1.6452 1.8661 1.9422

Page 27: Thesis Defense

Simulation Study – EL vs. NA

Summary:

As the sample size increase, the coverage probabilities (CP) for both methods increase.

As the censoring rate increase, the coverage probabilities (CP) for both methods decrease.

When the sample size is small, the CP for EL is better than NA, for very heavy censoring rate, both are not good enough though.

Page 28: Thesis Defense

Simulation Study – EL vs. NA

Summary:

Average length for the EL is a little longer than the NA in all cases.

A little over-coverage problem with the EL.

Under-coverage problem with the NA.

Page 29: Thesis Defense

Simulation Study – Kendall vs. others

Consider the following AFT model:

We observe and

Model 3: Z ~ Normal distribution as N(1, 0.52). The censoring time C ~ Normal distribution as N(µ, 42),

where µ produce samples with censoring rate equal to 10%, 30%, 50%, 75%.

The error term has Normal distribution as N(0, 0.52). Sample Size: 50, 100 and 200 Repetition: 5000

Page 30: Thesis Defense

Simulation Study – Kendall

Results for model 3:

    Confidence Level = 90% Confidence Level = 95%

CR n B-J Logrank Gehan Kendall B-J Logrank Gehan Kendall

10%50 0.8924 0.8879 0.8832 0.9110 0.9406 0.9399 0.9356 0.9516

100 0.8888 0.8909 0.8904 0.9212 0.9404 0.9479 0.9446 0.9630

200 0.8810 0.9059 0.8938 0.9012 0.9458 0.9500 0.9446 0.9506

30%50 0.8866 0.8869 0.8804 0.9078 0.9374 0.9359 0.9290 0.9522

100 0.8936 0.8889 0.8870 0.9212 0.9472 0.9410 0.9382 0.9596

200 0.8922 0.9139 0.8958 0.9108 0.9468 0.9619 0.9440 0.9592

50%50 0.8838 0.8798 0.8650 0.8978 0.9324 0.9319 0.9226 0.9370

100 0.8926 0.8939 0.8820 0.9090 0.9414 0.9519 0.9370 0.9538

200 0.8952 0.8929 0.8968 0.9142 0.9482 0.9469 0.9424 0.9604

75%50 0.8420 0.8350 0.8030 0.8556 0.9042 0.8910 0.8628 0.8866

100 0.8818 0.8740 0.8536 0.8856 0.9344 0.9300 0.9118 0.9340

200 0.8928 0.8860 0.8788 0.9012 0.9438 0.9440 0.9358 0.9490

Page 31: Thesis Defense

Simulation Study – Kendall

Results for model 3:

When the sample size is small (n=50) and the censoring rate is heavy, Kendall’s rank regression estimator is better an all the other estimators.

In other cases, Kendall’s rank regression estimator is also comparative.

Page 32: Thesis Defense

Real Application

1. Bone marrow transplants are a standard treatment for acute leukemia.

2. Total of 137 patients were treated.

3. For simplicity, the model contains only one covariate at a time, which is where Ti is Time to Death.

4. The response variable Time to Death takes values from 1 day to 2640 days with mean equal to 839.16 days.

Page 33: Thesis Defense

Real Application

We consider the following four variables:

1. Disease Group (3 groups)

2. Waiting Time to Transplant in Days (from 24 to 2616 days, mean=275 days)

3. Recipient and Donor Age (from 7 to 52 and from 2 to 56)

4. French-American-British (FAB): classification based on standard morphological criteria.

Page 34: Thesis Defense

Real Application

FAB Group Age TimeToTrxβ_hat -0.8388 -0.4558 -0.4588 -0.2055

CI CI CI CI

Wald

1-α=0.90(-1.3485 -0.3290)

(-0.8415 -0.0700)

(-0.7770 -0.1406)

(-0.4379 0.0268)

Length 1.0195 0.7715 0.6364 0.4647

1-α=0.95(-1.4461 -0.2314)

(-0.9154 0.0039)

(-0.8379 -0.0797)

(-0.4823 0.0713)

Length 1.2147 0.9193 0.7582 0.5536

1-α=0.99(-1.6370 -0.0406)

(-1.0598 0.1483)

(-0.9571 0.0394)

(-0.5694 0.1583)

Length 1.5964 1.2081 0.9965 0.7277

EL

1-α=0.90(-1.3725 -0.2541)

(-0.8626 -0.0382)

(-0.9318 -0.1718)

(-0.4904 -0.0134)

Length 1.1184 0.8244 0.7600 0.477

1-α=0.95(-1.4933 -0.1240)

(-0.9442 0.0390)

(-1.0503 -0.1087)

(-0.5728 0.0211)

Length 1.3693 0.9832 0.9416 0.5939

1-α=0.99(-1.6189 0.1328)

(-1.1249 0.2233)

(-1.2549 0.0054)

(-0.7332 0.0774)

Length 1.7517 1.3482 1.2603 0.8106

Page 35: Thesis Defense

Real Application

Results:1. Two methods show similar results.

2. Two exceptions may due to asymmetric CI of the EL.

3. Average lengths of the EL are a little longer than that of the NA. Same results with the simulation study.

Page 36: Thesis Defense

Conclusion & Discussion

Average length of the CI by the EL are slightly longer than that by NA.

The coverage probabilities of the EL are closer to the nominal levels than NA, especially when the sample size is very small and censoring rate is heavy.

Kendall’s rank regression estimator is better than the Buckley-James, Logrank and Gehan estimators in terms of coverage probabilities.

Page 37: Thesis Defense

Conclusion & Discussion

The combination of the Kendall estimating equation and the EL CI has strong advantages over the other considered approaches in the case of small sample size and heavy censoring rate.

The combination shows a problem of over-coverage.

A smoothing kernel is suggested to eliminate such a problem in the future work.

Page 38: Thesis Defense

Thank you !