Top Banner
Interpreting Statistical Evidence with Empirical Likelihood Functions Zhiwei Zhang Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, MD, USA Received 6 October 2008, revised 5 May 2009, accepted 6 May 2009 There has been growing interest in the likelihood paradigm of statistics, where statistical evidence is represented by the likelihood function and its strength is measured by likelihood ratios. The available literature in this area has so far focused on parametric likelihood functions, though in some cases a parametric likelihood can be robustified. This focused discussion on parametric models, while in- sightful and productive, may have left the impression that the likelihood paradigm is best suited to parametric situations. This article discusses the use of empirical likelihood functions, a well-devel- oped methodology in the frequentist paradigm, to interpret statistical evidence in nonparametric and semiparametric situations. A comparative review of literature shows that, while an empirical like- lihood is not a true probability density, it has the essential properties, namely consistency and local asymptotic normality that unify and justify the various parametric likelihood methods for evidential analysis. Real examples are presented to illustrate and compare the empirical likelihood method and the parametric likelihood methods. These methods are also compared in terms of asymptotic effi- ciency by combining relevant results from different areas. It is seen that a parametric likelihood based on a correctly specified model is generally more efficient than an empirical likelihood for the same parameter. However, when the working model fails, a parametric likelihood either breaks down or, if a robust version exists, becomes less efficient than the corresponding empirical likelihood. Key words: Consistency; Evidential analysis; Law of likelihood; Local asymptotic normality; Robust adjusted likelihood. Supporting Information for this article is available from the author or on the WWW under http://dx.doi.org/10.1002/bimj.200800209. 1 Introduction A major part of statistics is to interpret observed data as statistical evidence. Yet there is no consensus among statisticians on what constitutes statistical evidence and how to measure its strength. Confidence intervals, p-values and posterior probability distributions are commonly used to interpret and communicate statistical evidence. Hacking (1965) suggests that the likelihood function is the mathematical representation of statistical evidence and that the likelihood ratio measures the strength of statistical evidence for one statistical hypothesis versus another. This point of view has led to a likelihood paradigm for interpreting statistical evidence, which carefully dis- tinguishes evidence about a parameter from uncertainty in a statistic, performance of a procedure * Correspondence author: e-mail: [email protected], Phone: 11-301-796-6050, Fax: 11-301-847-8123 r 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 710 Biometrical Journal 51 (2009) 4, 710–720 DOI: 10.1002/bimj.200800209
11

Interpreting Statistical Evidence with Empirical Likelihood Functions

Mar 04, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Interpreting Statistical Evidence with Empirical Likelihood Functions

Interpreting Statistical Evidence with Empirical Likelihood

Functions

Zhiwei Zhang

Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring,

MD, USA

Received 6 October 2008, revised 5 May 2009, accepted 6 May 2009

There has been growing interest in the likelihood paradigm of statistics, where statistical evidence isrepresented by the likelihood function and its strength is measured by likelihood ratios. The availableliterature in this area has so far focused on parametric likelihood functions, though in some cases aparametric likelihood can be robustified. This focused discussion on parametric models, while in-sightful and productive, may have left the impression that the likelihood paradigm is best suited toparametric situations. This article discusses the use of empirical likelihood functions, a well-devel-oped methodology in the frequentist paradigm, to interpret statistical evidence in nonparametric andsemiparametric situations. A comparative review of literature shows that, while an empirical like-lihood is not a true probability density, it has the essential properties, namely consistency and localasymptotic normality that unify and justify the various parametric likelihood methods for evidentialanalysis. Real examples are presented to illustrate and compare the empirical likelihood method andthe parametric likelihood methods. These methods are also compared in terms of asymptotic effi-ciency by combining relevant results from different areas. It is seen that a parametric likelihood basedon a correctly specified model is generally more efficient than an empirical likelihood for the sameparameter. However, when the working model fails, a parametric likelihood either breaks down or, ifa robust version exists, becomes less efficient than the corresponding empirical likelihood.

Key words: Consistency; Evidential analysis; Law of likelihood; Local asymptoticnormality; Robust adjusted likelihood.

Supporting Information for this article is available from the author or on the WWW underhttp://dx.doi.org/10.1002/bimj.200800209.

1 Introduction

A major part of statistics is to interpret observed data as statistical evidence. Yet there is noconsensus among statisticians on what constitutes statistical evidence and how to measure itsstrength. Confidence intervals, p-values and posterior probability distributions are commonly usedto interpret and communicate statistical evidence. Hacking (1965) suggests that the likelihoodfunction is the mathematical representation of statistical evidence and that the likelihood ratiomeasures the strength of statistical evidence for one statistical hypothesis versus another. This pointof view has led to a likelihood paradigm for interpreting statistical evidence, which carefully dis-tinguishes evidence about a parameter from uncertainty in a statistic, performance of a procedure

* Correspondence author: e-mail: [email protected], Phone: 11-301-796-6050, Fax: 11-301-847-8123

r 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

710 Biometrical Journal 51 (2009) 4, 710–720 DOI: 10.1002/bimj.200800209

Page 2: Interpreting Statistical Evidence with Empirical Likelihood Functions

and personal belief (Royall, 1997; Blume, 2002). The goal of this article is not to compare thedifferent philosophies but to expand the likelihood paradigm to semiparametric and nonparametricsituations.

The existing literature about the likelihood paradigm generally requires specification of a para-metric model for the observed data. It is in this setting that Royall (2000) and Blume (2002, 2008)study the probabilities of observing misleading evidence and weak evidence. Royall and Tsou (2003)recognize this limitation and propose a robust adjusted likelihood method with some protectionagainst misspecification of the working parametric model. Their method has been extended togeneralized linear models by Blume et al. (2007). Despite its robustness properties, the robustlikelihood approach does require a parametric model to be specified and the resulting inferences(e.g. likelihood ratios) generally depend on the chosen model. Furthermore, Royall and Tsou (2003)assume that the object of inference remains the object of interest when the working model fails,which, as we shall see, may not be the case.

In this article, we note that the likelihood paradigm is not limited to parametric situations andthat empirical likelihood functions can be used to interpret statistical evidence in nonparametric andsemiparametric situations. There is a large and growing literature on the empirical likelihoodmethodology as a valuable tool in the frequentist paradigm (e.g. Owen, 1988, 1990, 1991, 2001; Qinand Lawless, 1994). However, its utility for interpreting statistical evidence in the likelihoodparadigm appears unnoticed. Even Royall and Tsou (2003) and Blume et al. (2007), while lookingfor robust likelihood functions, make no mention of the empirical likelihood approach. Thus thereappears to be a gap between the likelihood paradigm literature and the empirical likelihood lit-erature, and the present article is intended as a bridge, comparing and combining relevant resultsform the two bodies of literature. For example, it seems worth noting that, despite not being a trueprobability density, an empirical likelihood behaves much like a parametric likelihood as a re-presentation of statistical evidence. It may also be of interest to compare the different likelihoodmethods in terms of asymptotic efficiency.

The rest of the article is organized as follows. A brief introduction to the likelihood paradigm isgiven in Section 2. The empirical likelihood approach is then described in Section 3. Section 4illustrates and compares the different approaches with real examples. Section 5 gives some remarkson asymptotic efficiency. The article concludes with a discussion in Section 6.

2 The Likelihood Paradigm

Let X1,y,Xn be independent copies of X with density f( � ;y), where f is a known function and y 2 Yis an unknown, finite-dimensional parameter. Based on these data, the likelihood for y is given by

LðyÞ ¼Yni¼1

f ðXi; yÞ:

According to the law of likelihood (Hacking, 1965), the observed data provide evidence sup-porting one parameter value y1 over another value y2 if L(y1)4L(y2), and the strength of thatevidence is measured by the ratio L(y1)/L(y2). For this purpose, it is irrelevant whether y1 and y2 arepredetermined or data-driven. Royall (1997) gives an in-depth discussion of the law of likelihoodand proposes benchmarks for the strength of statistical evidence. Specifically, a likelihood ratioexceeding k5 8 is considered fairly strong evidence, whereas k5 32 is used to define strong evi-dence. Zhang (2009) proposes a generalized law of likelihood for composite hypotheses, whichstates that a hypothesis H1 : y 2 Y1 � Y is supported over another hypothesis H2 : y 2 Y2 � Y ifsupLðY1Þ4supLðY2Þ, with the generalized likelihood ratio supLðY1Þ=supLðY2Þ measuring thestrength of that evidence. Regardless of the specific hypotheses of interest, it is always helpful tosummarize the likelihood with support sets (intervals in typical one-dimensional cases) comprising

Biometrical Journal 51 (2009) 4 711

r 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 3: Interpreting Statistical Evidence with Empirical Likelihood Functions

the best-supported parameter values. Formally, the 1/k support set for y is given by fy :LðyÞ4supLðYÞ=kg for k41. The larger k is, the larger the 1/k support set. Despite a similarappearance, a support set is to be interpreted differently than a likelihood ratio confidence set. Theusual interpretation of confidence sets in terms of long-run coverage does not fit well into thelikelihood paradigm, where the emphasis is placed on understanding the observed data (as opposedto fictitious repetitions of the same experiment). Further discussions of support sets can be found inRoyall (1997, Section 1.12) and Zhang (2009, Section 6).

Standard results in the likelihood theory translate into important performance properties inthe likelihood paradigm. Suppose y is identifiable from f( � ;y) in the sense that f( � ;y1) 6¼f( � ;y2)whenever y1 6¼y2. Let y0 denote the true value of y. Then the law of large numbers implies thatas n-N,

LðyÞ=Lðy0Þ!p0 ð1Þ

whenever y 6¼y0. This is closely related to the consistency of the maximum likelihood estimator andthus will be referred to as the consistency property. In the present context, it means that misleadingevidence cannot occur very often in large samples: PfLðyÞ=Lðy0Þ � kg ! 0 for any y 6¼y0 and anyk40. If f( � ;y) is suitably smooth in y, then we also have

logfLðy0 þ fnIðy0Þg�1=2cÞ=Lðy0Þg!dNð�jjcjj2=2; jjcjj2Þ; ð2Þ

where c is a vector of the same dimension as y, ||c|| is its Euclidean norm, andIðyÞ ¼ �Efq2log f ðX ; yÞ=qyqyTg is the Fisher information for y. This property, known as localasymptotic normality (van der Vaart, 1998, Chapter 7), plays a central role in the asymptoticefficiency theory. In the likelihood paradigm, Eq. (2) can be used to approximate probabilities ofevents concerning the strength of the observed evidence. It implies, for example, that

PfLðy0 þ fnIðy0Þg�1=2cÞ=Lðy0Þ � kg ! Uð�jjcjj=2� log k=jjcjjÞ ð3Þ

with F being the standard normal distribution function. For any fixed k, the limit in Eq. (3) as afunction of c is bounded above by F(�(2log k)1/2).

When y is a vector, some of its components may be of greater interest than others. Supposey5 (Z,n), where Z is of primary interest and n is the nuisance parameter (vector). In the presence ofa nuisance parameter, it is sometimes possible to represent evidence about Z with a marginal,conditional or partial likelihood (Royall, 1997). However, such methods are available only underspecial circumstances. Another approach, based on the profile likelihood ~LðZÞ ¼ supnLðZ; nÞ, ismore generally applicable. Although ~L does not correspond to a real probability density function, iteasily meets the consistency criterion (1). Further, Royall (2000) shows that ~L is locally asympto-tically normal, that is,

logf ~LðZ0 þ fn ~Iðy0Þg�1=2cÞ= ~LðZ0Þg!

dNð�jjcjj2=2; jjcjj2Þ;

where ~IðyÞ ¼ IZZ � IZnI�1nn InZ, IZZ ¼ �Efq

2logf ðX ; yÞ=qZqZTg, IZn ¼ �Efq2logf ðX ; yÞ=qZqnTg, etc.

Note that ~Iðy0Þ is the efficient information for Z at Z0 with n0 unknown (Bickel et al., 1993, Chapter2). Thus the profile likelihood appears to exhibit reasonable behavior as a general approach toevidential analysis with nuisance parameters.

Since all models are wrong in reality, it is natural to ask what happens if the working model fails,that is, if the true distribution F of X is not described by f( � ;y) for any y. Royall and Tsou (2003)have proposed a robust likelihood approach that can be sketched as follows. DenoteyF ¼ arg max

yEF flog f ðX ; yÞg. Clearly, yF 5 y0 if F corresponds to f( � ;y0) and y0 is identifiable

from the model. If the model is incorrect, then yF is the probability limit (under F) of the maximumlikelihood estimator of y. Royall and Tsou (2003) refer to yF as the object of inference and assumethat it remains of interest even if the model is misspecified. Under this assumption, the consistencyproperty (1) holds with y0 replaced by yF, regardless of model correctness. However, we do notgenerally have a local asymptotic normality property analogous to Eq. (2) when the model is

712 Z. Zhang: Interpreting statistical evidence

r 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 4: Interpreting Statistical Evidence with Empirical Likelihood Functions

misspecified. For a scalar y, Royall and Tsou (2003) propose the following exponential adjustment.Let a ¼ �EF fq

2log f ðX ; yF Þ=qy2g, b ¼ EF ½fq log f ðX ; yF Þ=qyg2�, and let ða; bÞ be consistent for (a,b).

Then the adjusted likelihood is given by LAðyÞ ¼ LðyÞa=b, and it has both consistency and localasymptotic normality:

logfLAðyF þ ðna2=bÞ�1=2cÞ=LAðyF Þg!

dNð�jjcjj2=2; jjcjj2Þ;

even when the working model fails. A similar adjustment can be applied to the profile likelihood fora scalar parameter of interest when there is a nuisance parameter. Despite its robustness properties,an adjusted likelihood still depends on the original working model and the resulting inferences canbe somewhat arbitrary in practice. Furthermore, it is not yet clear how to construct a robustlikelihood if Z is not a scalar or, more commonly, if the parameter of interest is not directly relatedto the object of inference yF.

3 The Empirical Likelihood Approach

Instead of specifying and adjusting a working model, we now take a step back and rethink theessence of the statistical problem. We shall start by defining the parameter of interest as an answerto a given scientific question, without specifying the entire distribution of X. Specifically, let y bedefined by the equation EfgðX ; yÞg ¼ 0, where g is a known function whose dimension is at least thatof y. For example, we can set g(x;y)5 x�y if we are interested in the mean of X. As anotherexample, suppose X5 (Y,ZT)T follows a general linear model: EðY jZÞ ¼ yTZ, then a standardchoice for g would be (y�yTz)z. The dimension of g can be greater than that of y if auxiliaryinformation is incorporated to improve efficiency. For instance, if y5EX and we know thatE(X2)5m(y) for a known function m, we might set g(x;y)5 (x�y, x2�m(y))T (Qin and Lawless,1994, Example 1).

For any given g, an empirical likelihood for y can be defined as

LEðyÞ ¼ supYni¼1

wi : wi � 0 8i;Xni¼1

wi ¼ 1;Xni¼1

wigðXi; yÞ ¼ 0

( );

where the supremum of an empty set is taken to be 0. This is essentially a profile multinomiallikelihood based on the observed values of X, ignoring possible ties among the Xis. The empiricallikelihood methodology has originated from Owen (1988) and is well established by now for esti-mation and hypothesis testing; see also Owen (1990, 1991, 2001) and Qin and Lawless (1994).However, there has been little mention of its potential utility as a way to represent statisticalevidence in the likelihood paradigm. One possible reason for this might be the concern that theempirical likelihood LE does not have a direct probability interpretation. We shall see, however,that LE does possess the consistency and local asymptotic normality properties, which put it onequal footing with the profile likelihood ~L and the adjusted likelihoods LA and ~LA (the profileversion of LA), none of which corresponds to a real probability density in general. It is straight-forward to see that

LEðyÞ=LEðy0Þ!p0;

where y 6¼y0 and y0 is again the true value of the parameter. Furthermore, we have

logfLEðy0 þ fnIEðy0Þg�1=2cÞ=LEðy0Þg!dNð�jjcjj2=2; jjcjj2Þ;

where IEðyÞ ¼ EfqgðX ; yÞ=qygT½EfgðX ; yÞgðX ; yÞTg��1EfqgðX; yÞ=qyg (Qin and Lawless, 1994, The-orem 4). If y5 (Z,n)T, n being a nuisance parameter, evidence about Z can be represented with the

Biometrical Journal 51 (2009) 4 713

r 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 5: Interpreting Statistical Evidence with Empirical Likelihood Functions

profile empirical likelihood ~LEðZÞ ¼ supnLEðZ; nÞ. It can be shown that

~LEðZÞ= ~LEðZ0Þ!p0

for any Z 6¼Z0 and that

logf ~LEðZ0 þ fn ~IEðy0Þg�1=2cÞ= ~LEðZ0Þg!

dNð�jjcjj2=2; jjcjj2Þ;

where ~IEðyÞ is the inverse of the submatrix of IE(y)�1 corresponding to Z.

Although the empirical likelihood LE, a constrained maximum by definition, does not take aclosed form, it can be computed with relative ease and efficiency (Owen, 2001, Section 2.9, 3.14). Toavoid trivialities, suppose the convex hull of the g(Xi;y), i5 1,y,n, contains 0 as an interior point. ALagrangian argument shows that the maximizer in the definition of LE is given by

wi ¼1

nf1þ lTgðXi; yÞg; i ¼ 1; . . . ; n;

where l is a Lagrange multiplier that satisfiesXni¼1

gðXi; yÞ

1þ lTgðXi; yÞ¼ 0: ð4Þ

To ensure that all wi are positive, we must have

1þ lTgðXi; yÞ40; i ¼ 1; . . . ; n: ð5Þ

Now the log-likelihood is, up to an additive constant,

�Xni¼1

logf1þ lTgðXi; yÞg: ð6Þ

If g is one-dimensional, then the left side of Eq. (4) is monotone in l, so l can be found using abisection method or, more efficiently, a safeguarded zero-finding algorithm (Owen, 1988). With ahigher-dimensional g, it follows by convex duality that l may be found by minimizing Eq. (6)subject to the n constraints in Eq. (5). Owen (1990) suggests replacing the log function in Eq. (6)with a pseudo-log that coincides with the usual log over (1/n,N) and extends smoothly to the entirereal line. This allows l to be found by minimizing the modified version of Eq. (6) without con-straints using a standard Newton–Raphson algorithm. As in the parametric setting, profiling overnuisance parameters can be difficult and should be considered on a case-by-case basis.

4 Real Examples

4.1 Mephenytoin oxidation

We now illustrate and compare the different likelihood methods with a real example, a clinical studyof chloroguanide metabolism and S-mephenytoin oxidation (Skjelbo et al., 1996). The study in-volved 216 healthy subjects in Tanzania. Each subject took 100mg racemic mephenytoin and hadhis or her urine collected for 8 h. One of the main objectives of the study was to characterize thedistribution of the S-mephenytoin/R-mephenytoin (S/R) ratio in urine, with high values indicatinglow metabolism of mephenytoin. Figure 1 gives a histogram of the S/R ratio measurements, whichsuggests that the S/R ratio distribution is highly skewed to the right.

There are a number of likelihood functions that can be used to represent the observed evidenceabout the mean S/R ratio, say m. One possibility is the profile likelihood for the mean under anormal model with unknown variance, which is proportional to f

Pni¼1 ðXi � mÞ2g�n=2. Although the

normal model may not fit the data well, Royall and Tsou (2003) find that the normal profilelikelihood for the mean is robust (identical to its adjusted version). Alternatively, one might fit the

714 Z. Zhang: Interpreting statistical evidence

r 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 6: Interpreting Statistical Evidence with Empirical Likelihood Functions

data with an exponential model. The exponential likelihood for the mean is given bym�n expð�

Pni¼1 Xi=mÞ, which is not robust. Let �X denote the sample mean and s2 the sample

variance. Then �X2=s2 can be used as the exponent in the robust adjustment of the exponential

likelihood. Lastly, the empirical likelihood is available without specifying any parametric model. Allfour likelihood functions are plotted in Figure 2. In these plots and the subsequent plots, eachlikelihood function is divided by its maximum value and hence the peak value is invariably 1. InFigure 2, the empirical likelihood appears to agree well with the normal likelihood and the adjusted

S/R Ratio

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0 1.2

0

10

20

30

40

50

Figure 1 Histogram of the S/R ratio in the ex-ample of Section 4.1.

0.20 0.25 0.30 0.35 0.40 0.45 0.50

S/R Ratio

Like

lihoo

d

Normal

0.0

0.2

0.3

0.6

0.8

1.0

ExponentialAdjusted ExponentialEmpirical

1/8

1/32

Figure 2 Likelihood functions for the mean S/Rratio in the example of Section 4.1, with supportintervals indicated by the horizontal dotted lines.

Biometrical Journal 51 (2009) 4 715

r 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 7: Interpreting Statistical Evidence with Empirical Likelihood Functions

exponential likelihood, whereas the unadjusted exponential likelihood is apparently flatter than theother three. Also shown in Figure 2 are 1/8 and 1/32 support intervals, obtained by intersecting thehorizontal dotted lines with the various likelihood plots. For example, the 1/8 support intervals areapproximately (0.285, 0.354) for the normal likelihood, (0.279, 0.369) for the exponential likelihood,(0.288, 0.357) for the adjusted exponential likelihood and (0.287, 0.356) for the empirical likelihood.

Given the skewed nature of the S/R ratio distribution, it probably makes better sense to workwith the median (Q2) and other quartiles (Q1, Q3) as opposed to the mean. Again, this can be donewith or without a parametric model. Under the normal model with mean m and variance s2, the p-quantile is given by xp 5 m1szp, where zp is the p-quantile of the standard normal distribution. It isnot straightforward to derive the profile likelihood for xp. Furthermore, under the normal model theRoyall–Tsou adjustment does not apply to quantiles because their relationship to the mean and thevariance is undetermined for an arbitrary distribution. This is a serious concern because the normalmodel does not appear to fit the data well. Under the exponential model with mean m, we havexp 5�m log(1�p), hence the likelihood for xp can be obtained by simply rescaling the likelihood form horizontally. However, the Royall–Tsou adjustment is not available, again because xp need notrelate to mF 5EF X in any way for an arbitrary distribution. The empirical likelihood for xp is easilyobtained by setting gðx; xpÞ ¼ Iðx � xpÞ � p, where Ið�Þ is the indicator function. Figure 3 shows the(unadjusted) exponential and the empirical likelihood functions for all three quartiles. The ex-ponential likelihood functions are generally smoother than the empirical likelihood functions, whichis not surprising given the nonparametric nature of the latter. Figure 3 also shows 1/8 and 1/32support intervals for the three quartiles based on the two likelihood functions. Note that the twoapproaches do not agree well for lower quartiles, with the exponential likelihood supporting smallervalues of the S/R ratio.

4.2 Bioequivalence

Now let us consider a bioequivalence (BE) trial described in Wellek (2003, Chapter 9). BE trials areconducted to show that a generic drug or new formulation (test) is nearly equivalent in bioavail-

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0.0

0.2

0.4

0.6

0.8

1.0

S/R Ratio

Like

lihoo

d

ExponentialEmpirical

Q1 Q2 Q3

1/8

1/32

Figure 3 Likelihood functions for quartiles of theS/R ratio distribution in the example of Section4.1, with support intervals indicated by the hor-izontal dotted lines.

716 Z. Zhang: Interpreting statistical evidence

r 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 8: Interpreting Statistical Evidence with Empirical Likelihood Functions

ability to an approved brand-name drug or formulation (reference). There are different ways tomeasure bioavailability, but this example is primarily concerned with the area under the curve(AUC) for the serum concentration of the drug changing over time. The trial involved 25 patientsand followed a crossover design where each patient was randomly assigned to a treatment sequence(test followed by reference or the opposite, with equal probabilities). Let (XT,XR) denote the AUCmeasurements for the test and reference periods, respectively, on the same subject. Assuming thatthere are no sequence or period effects, we shall focus on the comparison of mT 5EXT with mR 5

EXR. In this context, it seems natural to define BE as 4/5rmT/mRr5/4 (Choi, Caffo, and Rohde,2008, Section 2).

A common model for this type of data would be a bivariate normal model for (log XT, log XR)(Choi et al., 2008). However, the profile likelihood for r5 mT/mR under such a model is notstraightforward to derive and cannot be robustified using the method of Royall and Tsou (2003)because the object of interest (r) does not relate to the object of inference (mean and variance–-covariance of (log XT, log XR)) when the model fails. On the other hand, an empirical likelihood forr is readily available by setting g(xT, xR; r)5 xT�rxR and is shown in Figure 4. This empiricallikelihood indicates, under the generalized law of likelihood (Zhang, 2009), that the BE hypothesisis overwhelmingly supported over its complement, with a generalized likelihood ratio greater than106. The 1/8 and 1/32 support intervals for r are found to be (0.936, 1.078) and (0.915, 1.102),respectively.

5 Asymptotic Efficiency

In the likelihood paradigm, it seems natural to compare different likelihood methods in terms ofsuch performance measures as the probabilities of misleading evidence (i.e. L(y)/L(y0)4k) and weakevidence (1/koL(y)/L(y0)ok). If y is a fixed alternative, these probabilities tend to 0 for all con-sistent likelihood methods, which is not an informative comparison. We shall therefore consider

0.7 0.8 0.9 1.0 1.1 1.2 1.3

0.0

0.2

0.4

0.6

0.8

1.0

Ratio of Means

Like

lihoo

d

Non−BE BE Non−BE

1/8

1/32

Figure 4 Empirical likelihood for the ratio ofmean AUC values (test/reference) in the exampleof Section 4.2, with the vertical dashed lines se-parating regions of BE and non-BE and the hor-izontal dotted lines indicating support intervals.

Biometrical Journal 51 (2009) 4 717

r 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 9: Interpreting Statistical Evidence with Empirical Likelihood Functions

moving alternatives of the form yn 5 y01n�1/2h with an arbitrary h. If a likelihood, say L�, satisfieslocal asymptotic normality with information I�, then expression (2) can be rewritten in terms of theyn as

logfL�ðynÞ=L�ðy0Þg!dNð�hTI�h=2; h

TI�hÞ:

In large samples, the probability of misleading evidence is approximated by

Uð�2�1ðhTI�hÞ1=2� ðhTI�hÞ

�1=2log kÞ; ð7Þ

and that of weak evidence is approximately

Uð�2�1ðhTI�hÞ1=2þ ðhTI�hÞ

�1=2log kÞ �Uð�2�1ðhTI�hÞ1=2� ðhTI�hÞ

�1=2log kÞ; ð8Þ

for any given threshold k41. We can also approximate the probability of the desired outcome (i.e.strong evidence for y0 over yn) as

Uð2�1ðhTI�hÞ1=2� ðhTI�hÞ

�1=2log kÞ: ð9Þ

Expression (7), as a function of (hTI�h)1/2, is known as the bump function (increasing then

decreasing) with maximum value F(�(2log k)1/2) (Royall, 1997, 2000). On the other hand, both Eqs.(8) and (9) range from 0 to 1 and are monotone in (hTI�h)

1/2. Thus, if I1 and I2 correspond to twolikelihood methods and I2�I1 is nonnegative-definite, then, for any h, the second method is asso-ciated with a smaller probability of weak evidence and a larger probability of strong evidence for y0over yn in large samples. This suggests that we may consider a likelihood method more efficientasymptotically than another likelihood method if the former is associated with a larger informationmatrix in the sense of nonnegative-definiteness. This notion of efficiency is consistent with that ofRoyall and Tsou (2003).

The information matrix IE associated with the empirical likelihood based on g(x;y) happens to bethe efficient information for y in a semiparametric model that consists of all probability distribu-tions under which EfgðX ; yÞg ¼ 0 for some y (Qin and Lawless, 1994, Theorem 3). This means,roughly,

(i) If I is the information for y in a correctly specified parametric submodel, then IZIE.(ii) There is a correct parametric submodel in which the information for y is IE.

Thus an empirical likelihood is generally less efficient than an unadjusted parametric likelihoodwhen the latter model is correctly specified. Of course, an unadjusted parametric likelihood may notbe interpretable when the working model is wrong. It will be of greater interest to compare anempirical likelihood with a robust adjusted likelihood for the same parameter with respect toefficiency because they are comparable in robustness. Royall and Tsou (2003) find that

(iii) A robust adjusted likelihood is as efficient as the unadjusted version if the working model iscorrect.

(iv) If the working model is incorrect, then the adjusted likelihood is generally less efficient thanany correct parametric likelihood.

It follows from (i) and (iii) that a robust adjusted likelihood is more efficient than an empiricallikelihood for the same parameter if the working model is correct. On the other hand, (ii) and (iv)together imply that the reverse inequality holds if the working model is incorrect.

718 Z. Zhang: Interpreting statistical evidence

r 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 10: Interpreting Statistical Evidence with Empirical Likelihood Functions

6 Discussion

There are several points to consider when choosing among different likelihood methods to representstatistical evidence. Clearly, the parametric approach may have an efficiency advantage if one is veryconfident about the model, and if the parametric likelihood can be robustified, then there is littleloss in robustness. The empirical likelihood approach, on the other hand, is more generally ap-plicable and may be preferable in the following situations:

(i) There is not sufficient information to specify a parametric model.(ii) The parameter of interest is not directly related to the object of inference (yF in Section 2)

when the working model fails.(iii) There is auxiliary information about the parameter of interest (e.g. Qin and Lawless, 1994).(iv) The parameter of interest is vector-valued.

The empirical likelihood approach has its own challenges. Unlike its parametric counterparts, anempirical likelihood typically is not available in a closed form. When y is vector-valued, it can bedifficult to profile over nuisance parameters. Further research in this area is warranted.

Acknowledgements The author thanks the associate editor and two anonymous referees for constructivecomments that have greatly improved this manuscript. The views expressed in this article do not necessarilyrepresent those of the U.S. Food and Drug Administration.

Conflict of Interests Statement

The author has declared no conflict of interest.

References

Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A. (1993). Efficient and Adaptive Estimation forSemiparametric Models. Johns Hopkins University Press, Baltimore.

Blume, J. D. (2002). Likelihood methods for measuring statistical evidence. Statistics in Medicine, 21,2563–2599.

Blume, J. D. (2008). How often likelihood ratios are misleading in sequential trials. Communications in Sta-tistics: Theory and Methods, 37, 1193–1206.

Blume, J. D., Su, L., Olveda, R. M. and McGarvey, S. T. (2007). Statistical evidence for GLM regressionparameters: a robust likelihood approach. Statistics in Medicine, 26, 2919–2936.

Choi, L., Caffo, B. and Rohde, C. (2008). A survey of the likelihood approach to bioequivalence trials.Statistics in Medicine 27, 4874–4894.

Hacking, I. (1965). Logic of Statistical Inference. Cambridge University Press, New York.Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75,

237–249.Owen, A. B. (1990). Empirical likelihood ratio confidence regions. Annals of Statistics 18, 90–120.Owen, A. B. (1991). Empirical likelihood for linear models. Annals of Statistics 19, 1725–1747.Owen, A. B. (2001). Empirical Likelihood. Chapman & Hall, Boca Raton, FL.Qin, J. and Lawless, J. (1994). Empirical likelihood and general estimating equations. Annals of Statistics 22,

300–325.Royall, R. (1997). Statistical Evidence: A Likelihood Paradigm. Chapman & Hall, Boca Raton, FL.Royall, R. (2000). On the probability of observing misleading statistical evidence. Journal of the American

Statistical Association 95, 760–768.Royall, R. and Tsou, T. -S. (2003). Interpreting statistical evidence by using imperfect models: robust adjusted

likelihood functions. Journal of the Royal Statistical Society, Series B 65, 391–404.

Biometrical Journal 51 (2009) 4 719

r 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Page 11: Interpreting Statistical Evidence with Empirical Likelihood Functions

Skjelbo, E., Mutabingwa, T. K., Bygbjerg, I., Nielsen, K. K., Gram, L. F. and Brosen, K. (1996). Chlor-oguanide metabolism in relation to the efficacy in malaria prophylaxis and the S-mephenytoin oxidation inTanzanians. Clinical Pharmacology and Therapeutics 59, 304–311.

van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press, Cambridge.Wellek, S. (2003). Testing Statistical Hypotheses of Equivalence. CRC Press, Boca Raton, FL.Zhang, Z. (2009). A law of likelihood for composite hypotheses. Under review. Available online at

http://arxiv.org/abs/0901.0463 (last accessed January 10, 2009).

720 Z. Zhang: Interpreting statistical evidence

r 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com