1 Data-Driven Confidence Interval Estimation Incorporating Prior Information with an Adjustment for Skewed Data 1 Albert VEXLER, Li ZOU, and Alan D. HUTSON Department of Biostatistics, State University of New York at Buffalo Bayesian credible interval (CI) estimation is a statistical procedure that has been well addressed in both the theoretical and applied literature. Parametric assumptions regarding baseline data distributions are critical for the implementation of this method. We provide a nonparametric technique for incorporating prior information into the equal-tailed (ET) and highest posterior density (HPD) CI estimators in the Bayesian manner. We propose to use a data-driven likelihood function, replacing the parametric likelihood function in order to create a distribution-free posterior. Higher order asymptotic propositions are derived to show the efficiency and consistency of the proposed method. We demonstrate that the proposed approach may correct confidence regions with respect to skewness of the data distribution. An extensive Monte Carlo (MC) study confirms the proposed method significantly outperforms the classical CI estimation in a frequentist context. A real data example related to a study of myocardial infarction illustrates the excellent applicability of the proposed technique. Supplementary material, including the R code used to implement the developed method is available online. KEY WORDS: Credible intervals; Bayesian estimation; Nonparametric confidence interval estimation; Empirical likelihood; Equal tail confidence interval; Highest posterior density confidence interval. 1. INTRODUCTION The Bayesian display of the upper and lower bounds of a credible set, which contains a large fraction of the posterior mass (typically 95%) related to a functional parameter, is an 1 Albert Vexler (E-mail: [email protected]), Li Zou (E-mail: [email protected]), and, Alan D. Hutson (E-mail: [email protected]), Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY 14214 USA.
33
Embed
Data-Driven Confidence Interval Estimation Incorporating ...€¦ · credible set or simply "confidence interval" (e.g., Carlin and Louis 2009:p. 35). Because of its efficiency and
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Information with an Adjustment for Skewed Data 1Albert VEXLER, Li ZOU, and Alan D. HUTSON Department of Biostatistics, State University of New York at Buffalo
Bayesian credible interval (CI) estimation is a statistical procedure that has been well
addressed in both the theoretical and applied literature. Parametric assumptions regarding
baseline data distributions are critical for the implementation of this method. We provide a
nonparametric technique for incorporating prior information into the equal-tailed (ET) and
highest posterior density (HPD) CI estimators in the Bayesian manner. We propose to use
a data-driven likelihood function, replacing the parametric likelihood function in order to
create a distribution-free posterior. Higher order asymptotic propositions are derived to
show the efficiency and consistency of the proposed method. We demonstrate that the
proposed approach may correct confidence regions with respect to skewness of the data
distribution. An extensive Monte Carlo (MC) study confirms the proposed method
significantly outperforms the classical CI estimation in a frequentist context. A real data
example related to a study of myocardial infarction illustrates the excellent applicability of
the proposed technique. Supplementary material, including the R code used to implement
The Bayesian display of the upper and lower bounds of a credible set, which contains a
large fraction of the posterior mass (typically 95%) related to a functional parameter, is an 1 Albert Vexler (E-mail: [email protected]), Li Zou (E-mail: [email protected]), and, Alan D. Hutson (E-mail: [email protected]), Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY 14214 USA.
The next proposition provides the asymptotic evaluations of the HPD CI estimation with
the bounds ,L UQ Q
.
Proposition 8. Under the assumptions of Proposition 7, the lower and upper HPD CI
12
bounds, LQ
and UQ
, asymptotically satisfy the equations
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )( )
2 3 432 3 2
2 3 432 3 2
exp 2 3 ( )
exp 2 3 ( )
L Gn n L Gn p L L
U Gn n U Gn p U U
n Q X nM Q X O n Q X Q
n Q X nM Q X O n Q X Q
σ σ π
σ σ π
− − + − + − = − − + − + −
,
( ) ( ) ( ) ( )( )
(1)
2 22 2
1
exp exp2 2
1 ( ).
U n
L
Q X
M MQ XGn Gn
p
n nd d
O n e
θ θ π θ θ θ θ π θ θσ σ
− +
− − − −
= +
∫ ∫
We note that one can easily derive asymptotic expression for the CI bounds in general
case by directly using the proof strategies of Propositions 2 and 5 when the prior
distribution is in Gaussian form. The following proposition confirms that the proposed
nonparametric procedures are consistent.
Proposition 9. Under assumptions of Proposition 7, we have
0.5Pr( ) 1 ( )L U pQ Q O nθ α −< < = − + and 0.5Pr( ) 1 ( )L U pQ Q O nθ α −< < = − +
.
Remark 1. We defined the log EL function, ( )θGl in a manner that has been extensively
dealt with in the EL literature (e.g., Owen, 1988, 2001; Vexler at al. 2014). In this case,
the constraint 0),(1
=∑ =
n
i iXG θ reflects empirically the functional meaning of the
parameter θ in the form ( )( ) 0,1 =θXGE , where G is known. One can consider
scenarios when the function G is unknown, for example, in the context of quantile
estimation (e.g., Chen and Hall 1993). In a subsequent paper, we plan to address this
problem. Further studies are needed to evaluate the Bayesian type EL CI approach in this
framework.
3. MONTE CARLO STUDY
In this section, we carry out an extensive MC study in order to evaluate the behavior of
13
the proposed CI estimation with fixed significance level set at 5%. Limpert et al. (2001)
showed that many measurements of markers related to health and social science have
skewed distributions. Toward this end, we focus on simulating data for this numerical
study following the distribution function ( ) ( )∫
+=<
2/exp
01
21 )(Pr
σtdxxftX , where )(xf is a
21(0, )LogNorm σ density distribution with 1σ = 1, 1.5, 2 and 01 == EXθ . It is known
that for normally distributed data EL methods demonstrate good properties, whereas EL
type procedures based on lognormal-type distributed data can lead to unstable results (e.g.,
Vexler at el. 2009). In these Monte Carlo evaluations we also generated data from a (0,1)N
distribution. We consider the following prior distributions in our simulation study: ( )π θ =
( ))2/()(exp)2( 225.02ππ σθπσ d−−− with 1,0=d and πσ = 0.25, 0.5, 1. These priors
reflect different scenarios depicting our “relative confidence” with respect to the prior
information pertaining to the unknown parameter EXθ = . At each baseline distribution
and prior density function, we generated 5000 samples of size n = 7, 15, 25, 50, 100. The
95% classical nonparametric CI is 21.96 nX nσ ± , where 2nσ is the sample variance.
The proposed method uses the EL functions instead of the parametric joint density
functions in Bayesian CI estimation. In several scenarios, Bayesian credible sets can
demonstrate poor frequentist properties (e.g., Szabó et al. 2015). Proposition 3 ensures that
the coverage probability of the new data-driven CI estimation is controlled asymptotically
in the frequentist context. We evaluate the frequentist coverage based on finite samples.
The criteria for comparison are the MC coverage probability (CP) and the MC average
length of the CI (LG). Table I demonstrates the MC results.
14
For the cases of lognormally distributed data with priors
( ))2/()(exp)2( 225.02ππ σθπσ d−−− that provide correct information regarding θ when
0=d , the CPs of the proposed CI estimations (both the ET and HPD CI estimators) are
almost uniformly closer to the expected 95% level than those of the classical CI estimation.
The LGs of the proposed method are shorter in most cases. When the skewness of the
lognormal distribution increases, the above conclusions are magnified. When using the
misspecified prior, (1,1)N )1( =d , the proposed method maintains similar CPs with a
little increase in LGs as comparing to the classical method. For the cases of baseline data
from normal distributions with the correctly specified priors, the performance of the
proposed methods are comparable to the classical method. In these cases the classical CI
method is a product of the parametric maximum likelihood technique and then can be
expected to be very efficient.
In the Supplementary Material we compare the proposed nonparametric approach with the
following methods: (1) the inverse Edgeworth expansion based method proposed by Hall
(1983); (2) the parametric Bayesian CI estimation; (3) a frequentist method for improved
CI estimation of the log-normal mean; and (4) the classical EL confidence interval
estimation (Owen 2001). In the considered MC scenarios, the data-driven CI estimation
outperforms Hall’s approach. Perhaps, this event is a result of the fact that in order to
obtain the confidence intervals using Hall’s method it is required to estimate several
unknown parameters within the corresponding asymptotic approximations. The estimators
of the parameters can be very biased when skewed data are used. The proposed CI
15
approach demonstrates better CPs and LPs than those provided by the classical EL method.
In the context of the comparisons with (2) and (3), the new distribution-free CI estimation
shows results that are very close to outputs of the parametrical methods.
Table 1. The Monte Carlo coverage probabilities (CP) and average lengths (LG) for the CI estimation of the mean. The Z indicates the results of the classical CI estimation.
Zhou, X., and Reiter, J. P. (2010), “A Note on Bayesian Inference after Multiple
Imputation,” The American Statistician, 64, 2, 159-163.
1
On-line Supplement to “Data-Driven Confidence Interval
Estimation Incorporating Prior Information with an Adjustment for
Skewed Data” 1Albert VEXLER, Li ZOU, and Alan D. HUTSON Department of Biostatistics, State University of New York at Buffalo
Abstract: This on-line supplement to “Data-Driven Confidence Interval Estimation
Incorporating Prior Information with an Adjustment for Skewed Data” contains an
appendix of technical proofs of Propositions and Lemmas and R code to implement the
developed methods proposed in the article. In this Supplement Material we also show
additional results of the Monte Carlo evaluations of the proposed method.
APPENDIX
Proof of Proposition 1
The proof of Proposition 1 is based on the fact that function ( )lr θ is highly peaked about
its maximum 1
nii
X X n=
=∑ . We will use the fact that ( )lr θ can be well approximated
by the function 2 2( ) / 2 nn Xθ σ− − , when values of θ are close to X . Toward this end,
we first show that
( ) ( )( )
( ) ( ) ( ) ( )( )2exp2/1
2/11
n
nX
nX
lrX
X
lr cOdede n
n
n ϕθθpθθpϕ
ϕ
θθ −+= ∫∫−
−
+
−,
where 0>c is a constant, a positive sequence ( ) ∞→= εϕ nOn , for some 0>ε ,
05.0 →−nnϕ and ( ) ( ) ( )2/12/1
2/1
−+
−=∫
−
−nOde
nX
nX
lrn
n
ϕ
ϕ
θ θθp , as ∞→n . This approximation allows
us to analyze the numerator and denominator in (3). Denote the log EL function
1Address for correspondence: Department of Biostatistics, University at Buffalo, 706 Kimball Tower, Buffalo, NY 14214-3000, USA. Email: [email protected].
with 1l , 2l that are the Lagrange multipliers. One can show that ( )( ) 1i ip n Xl θ
−= + − ,
where l is a root of the equation ( ) ( )( )1/ 0n
i iiX n Xθ l θ
=− + − =∑ .
By virtue of Lemma 10.2.1 in Vexler et al. (2014a), when Xθ < , ( )lr θ is increasing;
when Xθ > , ( )lr θ is decreasing. This implies that the log empirical likelihood ratio
function ( )lr θ defined in (2) has the maximum at Xθ = . Denote 0.5na X nϕ −= − and
0.5nb X nϕ −= + where 1/6
n n βϕ −= and ( )6/1,0∈β . Then it turns out that
( ) ( ) ( ) ( )( ) ( )
(1) (1)
( ) ( ) ( ) ( ) n nX a b Xlr lr lr lr
X X a be d e d e d e dθ θ θ θp θ θ p θ θ p θ θ p θ θ= + +∫ ∫ ∫ ∫ ,
and ( ) ( ) ( )(1) (1)
( ) ( ) ( ) L Lq a qlr lr lr
X X ae d e d e dθ θ θp θ θ p θ θ p θ θ= +∫ ∫ ∫ .
By the virtue of the above considerations we can bound the remainder term ( )
(1) (1)
( ) ( )( ) ( ) ( )na Xlr a
X
l ar lr
Xe d e d eθ p θ θ p θ θ≤ ≤∫ ∫ .
In order to evaluate ( )lr a , we define the function
1( ) ( ) / ( ( ))n
i iiL X n Xl θ l θ
== − + −∑ . (1.1)
We rewrite (1.1) at aθ = such that
( )1
1 1
0.5 0.5
0.5 0.5 0.5
0.5
0.50.5
11
2
.1
1 0
( ) ( ) / ( ( ))
( ) 1 ( ) ( )1 1 ( )
( )1 ( )1 (
n n
n n n
n
nn
n
ni ii
n i i i
i i
ii
i
L X n X
X n X n X
n n X
XX nn n
X n X n
X n X n X n
X n
X nX nX nX
ϕ ϕ
ϕ ϕ ϕ
ϕ
ϕϕ
l l
l ϕ
l l
l
l
=
− −
−=
−−
− −
− − −
−
−−
−
= − + −
− + − − − =+ −
−= − −
+ +
+ + +
+
++
++ −
∑
∑
( )1
51
. 1 2)
.n n
i i= =
∑ ∑
Defining 2/3 1c nnl τ −= , where n nγτ = , 0 1/ 6γ β< < < , and plugging it into (1.2) we
have 0.5 2
0.
2/3 1
1/3 11
5
( )1( ) ,1 ( )
ni
c nn
i n nn i
X nXnnLnn n X
nX
l ϕϕ
ϕτ τ
−
=
−
−− −
+−+ +
= −−∑
3
Since ( )1/3 1ip
n
X X On τ−
= (e.g., Owen 1988), we have
( )21/6 0.5
1
( )1( ) .1 1
ni
c nin p
nXXnnLn O
nl ϕτ
ϕ −
=
−= −
++∑
Thus ( )cnL l → −∞ , as ∞→n . In a similar manner, ( )cnL l− →∞ , as ∞→n .
And then the solution, 0l , of equation 0( ) 0nL l = belongs to the interval [ ],c cl l− , i.e.
( )2/3 10 p nO nl τ −= .
Let us now derive the approximate value of 0l as ∞→n . Since ( ) 00 =lL ,
0.5
0.5110
( ) 0.1 ( )
n ii
i
n
n
X nX n
Xn X ϕl
ϕ−=
−
−
+−+
=+ −∑ (1.3)
Applying a Taylor series expansion to (1.3) considering 10
0.5( )ni XX nn ϕl −− − + around
zero we obtain
( ) ( ) ( )( )
.01
11
3
25.02205.01
05.0 =
+
+−++−−+−∑
=
−−−−−
n
i i
ninini
nXXnnXXnnXX
ωϕl
ϕlϕ (1.4)
where 0.5100 ( )i i nnn X Xω l ϕ− −+< < − . Since ( )2/3 1
0 p nO nl τ −= , we can rewrite (1.4) in the
form 2 31/3
21/2 1/2 1/21 1 1
( ) 1 0.n n n
i i ii i
n
in
n nO nX X X Xnnn
Xn
Xnϕ l ϕ ϕ
τ= = =
− − − + − =
+ + +∑ ∑ ∑ (1.5)
Then solving (1.5) gives the approximate solution by 1/2
1/2
1/3
0 21 2
1
( ) .( )
n
n
nn
ii
X n
On n
n Xl
τϕ
ϕ −−
=
= +− +∑
(1.6)
Applying a Taylor series expansion to ( )lr a considering 10
0.5( )ni XX nn ϕl −− − + around
zero yields
0.5
0.5 30.5 0
0
1
2 30 0 0
2 3 * 31 1
.5 2
1
)
))
( ) log 1 (
(1 1( (2 3 (1 )
) ,
n
ii
n n ni
i i
n
n
i i i in n
lr a Xn
XX
X n
X nX n Xn n n
X n
ϕ
ϕϕ ϕ
l
l l lω
=
= = =
−
−− −
= − + − −
= − − + −+ −
+
+++
∑
∑ ∑ ∑
where 0.5* 100 ( ).i i nX nn Xω ϕl −−< − +< By virtue of (1.6) and the fact that
4
2/30 ( / )nO nl τ= we have
1/2 0.5 2
2 21/2
0.5 2 20.5 2
2 1/
230 0
21
4/3
2 2 11
11
4/3 8/3
2
2
0.5 2 4 211
)
12 ) ]
1( ) ( ( )2
( )[ ((
( )(
)
( ))
12
n
ii
n nn iii
i
nn
n n
n nn
nn
n
n nii
lr a X O nn n
O
n X n
n nnX nX n
nX
nn n Xnn X
O n O nn n nn X n
γϕ ϕ
ϕ
l l
τϕ
ϕ
ϕτ τ
ϕϕ
ϕ
−−
=
−−
=
=
−−
−
=
−
= − + − −
=
+
− + +
−−−
+
+
++
−
∑
∑∑
∑0 3
1
22
.5 )( ( )n
i ni
X O nX n γϕ −
=
−− −+∑
2
0.51 2
3
1
1 ( ) , a( )
s ,2 n
ii
n
nXO n n
nn X
γϕ
ϕ −
−
−
=
= − − → −∞− +
→∞
∑
where 2 1/3 2n n βϕ −= →∞ and 0 1/ 6γ β< < < . Thus we conclude that
( ) ( )( ) ( )( )(1)
1/3 2exp ( ) ( ) exp exp 0a
Xlr d lr a O wn βθ p θ θ −≤ = − →∫ , as n →∞ ,
where w is a positive constant.
Now define 1/2nb X nϕ −= + and in a similar manner to the proof scheme above we have
( ) ( )( ) ( ) ( )( ) ,0exp)(exp 21 →−=≤∫ n
X
b
lr wOblrden ϕθθpθ
where 1w is a positive constant and n →∞ .
Thus we show that ( ) ( )1/2
( )
1/2(1)
( ) ( ) n n
n
X nlr lr
X n
X
Xe d e d
ϕθ θ
ϕp θ θ p θ θ
−
−
+
−≅∫ ∫ .
Similarly we have that ( ) ( )( )
( ) ( )1/21
L L
n
q qlr lr
X X ne d e dθ θ
φp θ θ p θ θ
−−≅∫ ∫ .
Now we consider the main term ( )( ) b lr
ae dθ p θ θ∫ of the marginal distribution defined in
(2). This integral consists of the log empirical likelihood ratio function ( )lr θ and we
expand ( )lr θ at Xθ = using Taylor theorem,
2
2 33 4
2 3( )
1 ( ) ( ) ( ) ( ) ( ) ( )2
1 1( ) ( )( ) ( ) , [0,1].6 24
u X
u X u X
d ulr lr X X X X du
d u d uX Xdu du θ ϖ θ
lθ θ l θ
l lθ θ ϖ
=
= = + −
= + − + −
+ − + − ∈
(1.7)
By virtue of Proposition 10.2.1 in Vexler et al. (2014a), we have
5
( )
( )
( ) ( )
332
13 32 2 221 21
1 1
2 /2( ) ( ),
n
ii n
n nu X n u X ni i
i i
n X X nnMd u n n d u
du dun X X n X X
l lσ σ
=
−= = −
= =
−= − = − = − =
− −
∑
∑ ∑,
as well as 3 3( ) / ( ),pd d O nl θ θ = for [ , ]a bθ ∈ . The argument X maximizes the
function 1
( ) logn
ii
l pθ=
=∑ , ( ) )/1log( nnXl = and then ( ) 0lr X = as well as ( ) 0Xl = .
Using the results above, (1.7) and a Taylor expansion for 3 3 2 3( ) / 3( )n nnM Xθ σ− and
4( )( )pO n Xθ − around zero, we have
( )3
2 3 42 2 3
32 3 2
2 2 3 2
2
( )
42
exp ( )
exp (
( ) ( ) ( )( )2 3( )
( ) ( ) exp ( )2 3( ) 2
) ( )
( ) ( ) . ( ) exp ( )2
np
n n
n
n
b bl
a a
n n
n
r
p
nMn X X O n X
nMn nX X X
n
e d d
d d
O n X dX
θ θ θ θp θ θ p θ θ
p θ θ p θ θ
p
σ σ
θ θ θσ σ σ
θ θσ
θ θ
− − + − + −
− − − − −
− − −
=
= +
+
∫ ∫
∫ ∫
∫ ( ) 1.8
Now by virtue of the definition (3), the formula (1)
( | )2
Lq
EXh X dα θ θ= ∫ can be rewritten as
22
22
( )2
( )2
exp ( )
2exp ( )
L
n
q
n
n
n X
n
dR
dX
θσ
θσ
p θ θα
p θ θ
= +
− −
− −
∫
∫,
where
( )
( ) ( )
( ) 22
2 22
-1
( ) ( )2
exp ( )
exp ( )
( )2
( ) ( )exp ( )2 2
L
L
q lrn a
q b blr lr
a a
n
n n
n X
n
R e d d
d e d e d dnX X
θ
θ θ
p θ θ p θ θ
p θ θ p θ
θσ
θ θθ p θ θ p θ θσ σ
= −
− −
− − −
−
∫ ∫
∫ ∫ ∫ ∫
It is clear that one can use (1.8) and the facts:
(1) ( )( )21( ) ( ) ( ) ( ) ( )2
X X X X X q Xp θ p θ p θ p θ′ ′′= + − + − + − , [0,1]q∈ ;
(2) 1/6 1/2/b a n nβ−− = ; (3) 22( ) exp ( 0)
2 n
n XX dθ θσ
θ
− = − −∫ to represent the
6
remainder term nR in the form
( )3 1n n n pR M C O n ε− += + ,
where ( ) ( )0.5 2 31 /2 1 /22 1 /2 /3n nC n z zα αϕ σ−− −= + and > 0ε .
Combing the above asymptotic approximations, we have
( )
( )( )(1)
( )
(1)
22
3 1
22
exp ( )2
2exp ( )
2
L
n
q
Xn
n n pX
Xn
n X dM C O n
n X d
ε
θ p θ θσα
θ p θ θσ
− +
− −+ +
−
=−
∫
∫. (1.9)
Similarly one can show that
( )
( )( )(1)
( )
(1)
22
3 1
22
exp ( )2
12
exp ( )2
U
n
q
Xn
n n pX
Xn
n X dM C O n
n X d
ε
θ p θ θσα
θ p θ θσ
− +
− −− = + +
− −
∫
∫.
The proof of Proposition 1 is complete.
Proof of Lemma1.
The proof of Lemma 1 is just a straightforward rearrangement of the Equations in
Proposition 1. It shows a similar structure as compared to the classic Bayesian
Normal/Normal model. For details see Carlin and Louis (2009).
Proof of Proposition 2.
We assume that ( )p θ is a normal density function with mean µ and variance 2σ . We
first derive a Lemma which is useful for this proof.
Lemma 2, ( )0.5/2L z Ou n ε
α− +− = where Lu is defined in Proposition 2.
Proof: first note the order of a component in remainder term nR , 3 0.5( )n n pM C O n ε− += .
Equation (1.9) with the above fact imply
0.5( ) ( ) 2 L pu O n εα − += Φ + , (1.10)
7
where ( )Φ ⋅ is a cumulative distribution of the standard normal random variable.
Now we rearrange equation (1.6) as
0.51 /2( ) ( ) ( )L pu z O n εα
− +−Φ −Φ = , (1.11)
where 1 /2z α− is defined as 1 /2( ) 2z α α−Φ = .
We also have the inequality 1 /2
2/2
/21( ) ( ) exp( )
22 2Lu L
L z
u zzu z dzα
αα p p−
−Φ −Φ = − ≤∫
Combining equation (1.7) and above result, we have as n →∞ ,
20.5
/ ( )pz Ou nαε− +− = .
This shows the proof of Lemma 2.
Base on Lemma 1, Equation (1.9) becomes
( ) ( )εα +−++Φ= 13
2nOCMu pnnL . (1.12)
Now we expand function ( )LuΦ using the Taylor theorem with respect to Lu around
1 /2Lu z α−= , we have
1 /2 1 /2
1/2( ) ( ) ( ) ( ) ( )
L LL L u z L u z L pu u u u z O nα α
εα− −
− += =′Φ = Φ +Φ − + . (1.13)
Combing Equations (1.13) and (1.12), we obtain
/2 /2
3 1/2( ) ( ) ( ) ( )
2 u z u z n n pu u u z M C O nα α
εα
α − += =′= Φ +Φ − + + . (1.14)
Then we have the expression for Lq as
( ) ( )2 2 2 2 3
2 11 /2 1 /22 2 2 2 2 2
3n n n
L pn n n
n X Mq z z o nn n nα α
σ µ σ σ σσ σ σ σ σ
−− −
+= − + + +
+ +.
In a similar manner one can show that the expression for Uq as
( ) ( )2 2 2 2 3
2 11 /2 1 /22 2 2 2 2 2
3n n n
U pn n n
n X Mq z z o nn n nα α
σ µ σ σ σσ σ σ σ σ
−− −
+= + + + +
+ +.
The proof of Proposition 2 is complete.
Proof of Propositions 3-9.
8
The proof of Proposition 3 is omitted since it directly follows from the application of
Slutsky’s theorem and an Edgeworth expansion technique.
One can use the proof schemes of Propositions 1 and 2 to show Propositions 4-9 in a
similar manner.
R-Code
################################################################# ########## R code to calculate the proposed confidence interval (CI) estimation ########### for the Monte Carlo simulations #################################################################
#The sample size ,significant level alpha and library in R
library("emplik") n<- 50 ; alpha<- 0.05
# Assume that the baseline data distribution is log-normal with mean zero and variance 1 # and prior distribution is normal distribution with mean zero and variance 1.# # Generate a sample of n centered random variables from the baseline data distribution#
x<- rlnorm(n,0,1)-exp(0.5)
#Create function integ which equals to numerator ( ) ( )lre θ p θ in Equation (2)
In this section, using the framework described in Section 3, we demonstrate numerical
10
comparisons between the proposed nonparametric approach and the following methods:
(1) The inverse Edgeworth expansion based method proposed by Hall (1983).
(2) The parametric Bayesian CI estimation.
(3) A frequentist method for improved CI estimation of the log-normal mean. This CI
estimation based on log-normally distributed data is described in Zhou and Gao (1997). In
this aspect Zhou and Gao (1997) suggested to use the Cox’s method to improve the CI
estimation. In our study we apply the Cox’s approach (see for details Zhou and Gao 1997).
(4) The classical EL CI estimation (Owen 2001).
We present the results of this limited Monte Carlo study in Tables 1 and 2.
Table 1. The Monte Carlo coverage probabilities (CP) and average lengths (LG) for the CI estimation of the mean. The notations ET and HPD represent the proposed equal-tailed and highest posterior CI estimation; ELR represents the classical empirical likelihood ratio CI estimation; H represents the Hall (1983)’s method. C represents the well-known Cox’s method CI estimation which is based on the maximum likelihood technique.
n ET HPD ELR H C CP LG CP LG CP LG CP LG CP LG 7 74.50% 1.66 75.20% 1.65 73.37% 2.08 78.40% 2.39 88.83% 4.99 15 83.83% 1.61 85.03% 1.59 83.90% 1.81 83.47% 1.86 91.90% 2.49 25 88.13% 1.45 88.63% 1.43 87.37% 1.50 85.77% 1.48 93.67% 1.74 50 89.97% 1.17 91.47% 1.14 90.37% 1.12 88.63% 1.08 94.37% 1.18 100 92.12% 0.90 93.90% 0.88 93.07% 0.84 90.68% 0.80 95.03% 0.81
Table 2. The Monte Carlo coverage probabilities (CP) and average lengths (LG) for the CI estimation of the mean. The notations ET and HPD represent the proposed equal-tailed and highest posterior CI estimation; PBM represents the parametric Bayesian CI estimation given the known data distribution.
1,..., ~ (0, 1)nX X Norm , Prior: ~ (0, 1)Np
n ET HPD PBM CP LG CP LG CP LG 7 86.81% 1.18 87.44% 1.18 95.36 1.34 15 92.73% 0.96 92.97% 0.95 95.37% 0.98 25 93.71% 0.77 93.87% 0.77 95.23% 0.77