Confidence Distribution (CD) & Bayesian/Frequentist/Fiducial (BFF) inferences Min-ge Xie Department of Statistics & Biostatistics, Rutgers University Celebrating 30 Years of UIUC Statistics November 20, 2015 Research supported in part by grants from NSF
25
Embed
@let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Confidence Distribution (CD) &Bayesian/Frequentist/Fiducial (BFF) inferences
Min-ge Xie
Department of Statistics & Biostatistics,Rutgers University
Celebrating 30 Years of UIUC StatisticsNovember 20, 2015
Research supported in part by grants from NSF
Parameter estimation —Point estimator, confidence interval & confidence distribution
In statistics, we routinely estimate unknown parameters using
Point estimator
“Interval estimator”: confidence Interval (CI)
A very simple question is
— Can we also use a distribution function (“distribution estimator”)to estimate an unknown parameter of interest in frequentistinference in the style of a Bayesian posterior?
The answer is YES and one such estimator is -
Confidence distribution (CD)
Example: X1, . . . ,Xn i.i.d. follows N(µ,1)
Point estimate: x̄n = 1n∑n
i=1 xi
Interval estimate: (x̄n − 1.96/√
n, x̄n + 1.96/√
n)
Distribution estimate: N(x̄n,1n )
I The idea of the CD approach is to use a sample-dependentdistribution (or density) function to estimate the parameter ofinterest.
Example: Many ways to obtain "distribution estimate" N(x̄n,1n )
Varying µ0 ∈ Θ! =⇒ Cumulative distribution function of N(x̄n,1n )!
Method 4: Normalizing likelihood function
Likelihood function
L(µ|data) =∏
f (xi |µ) = Ce− 12∑
(xi−µ)2= Ce− n
2 (x̄n−µ)2− 12∑
(xi−x̄n)2
Normalized with respect to µ
L(µ|data)∫L(µ|data)dµ
= ... =1√
2π/ne− n
2 (µ−x̄n)2
It is the density of N(x̄n,1n )!
Method 5: . . . . . .
Inference using CD: Point estimators, Intervals, p-values & more
b
Long history
Long history: uncertainty measure on parameter space without any prior
Fraser (2011) suggested the seed idea (alas, fiducial idea) can betraced back to Bayes (1763) and Fisher (1922).
The term “confidence distribution” appeared as early as 1937 in a letterfrom E.S. Pearson to W.S. Gosset (Student)
The first use of the term in a formal publication is Cox (1958)
No precise yet general definition for a CD in the classical literature
Most commonly used approach is via inverting the upper limits of awhole set of lower-side confidence intervals, often using some specialexamples (e.g., Efron, 1993, Cox, 2006)
Renewed interest on confidence distribution
Recent developments
Entirely within the frequentist school — Not attempt to derive a new“paradox free” fiducial theory (not possible!)
Focus on providing frequentist inference tools for problems whosesolutions are previously unavailable or unknown.
Discussion article on CD (Xie & Singh 2013, Int. Stat. Rev.)
Cox (2013, Int. Stat. Rev. ): The CD approach is aimed “toprovide simple and interpretable summaries of what canreasonably be learned from data (and an assumed model).”Efron (2013, Int. Stat. Rev. ): The CD development is “agrounding process” to help solve “perhaps the most importantunresolved problem in statistical inference” on “the use of Bayestheorem in the absence of prior information.”
Renewed interest in distributional inference
Recent developments of confidence distributions & related topics
A confidence distribution (CD) is a sample-dependent distributionfunction that can represent confidence intervals (regions) of all levels fora parameter of interest.
Formally,
Definition
A sample-dependent function on the parameter space(i.e., a function on
Θ×X)
is called a confidence distribution (CD) for parameter θ, if:
R1) For each given sample, it is a distribution function on the parameterspace;
R2) The function can provide confidence intervals (regions) of all levels for θ.
(Notations: Θ = parameter space; X = sample space)Example: N(x̄n,
1n ) on Θ = (−∞,∞).
Analogy examples in point estimation
Point estimation [Point (sample statistic) + Performance]
Consistent estimator/estimate θ̂
R1) It is a point (sample statistic) on the parameter space.R2) It tends to the true θ0, as n→∞.
Unbiased estimator/estimate θ̂
R1) It is a point (sample statistic) on the parameter space.R2) It is unbiased E θ̂ = θ0.
Distribution estimation [(Sample-dependent) dist. function + Performance]
Confidence distribution
R1) It is a (sample-dependent) distribution function on theparameter space.
R2) It ensures the coverage rates of confidence intervals of alllevels.
Other “distribution estimators” — future concepts?
CD — a unifying concept for distributional inference
Wide range of examples: bootstrap distribution,(normalized) likelihood function, p-value functions, fiducialdistributions, some informative priors and Bayesianposteriors, among others
Our understanding/interpretation: Any approach, regardless of being
frequentist, fiducial or Bayesian, can potentially be unified under the
concept of confidence distributions, as long as it can be used to buildconfidence intervals of all levels, exactly or asymptotically.
o May help bridge some gap between different statistical paradigms...
Three forms of CD presentations
0.0 0.2 0.4 0.6
01
23
4
mu
CD
de
nsity
0.0 0.2 0.4 0.6
0.0
0.4
0.8
mu
CD
0.0 0.2 0.4 0.6
0.0
0.4
0.8
muCV
.
Confidence density: in the form of a densityfunction hn(θ)
e.g., N(x̄n,1n ) as hn(θ) = 1√
2π/ne− n
2 (θ−x̄n)2.
Confidence distribution in the form of acumulative distribution function Hn(θ)
e.g., N(x̄n,1n ) as Hn(θ) = Φ
(√n(θ − x̄n)
)Confidence curve:CVn(θ) = 2 min
{Hn(θ), 1− Hn(θ)
}e.g., N(x̄n,
1n ) as CVn(θ) =
2 min{
Φ(√
n(θ − x̄n)), 1− Φ
(√n(θ − x̄n)
)}
CD viewed from a different angle – CD-random variable
For each given sample xn, H(·) is a distribution function on Θ
=⇒We can simulate a random variable ξ such that ξ|Xn = xn ∼ H(·)!
We call this ξ a CD-random variable.
– The CD-random variable ξ is viewed as a “random estimator” ofθ0 (a median unbiased estimator)
I Convenient format for connecting with Bootstrap, fiducial and Bayesianmethods
CD random variable and bootstrap estimator
Normal example earlier: Mean parameter µ is estimated by N(x̄ , 1/n):If we simulate a CD-random variable ξ|data ∼ N(x̄ , 1/n),
ξ − x̄1/√
n
∣∣∣∣x̄ ∼ x̄ − µ1/√
n
∣∣∣∣µ (both ∼ N(0, 1))
The above statement is exactly the same as the key justification forbootstrap, replacing ξ by a bootstrap sample mean x̄∗:
x̄∗ − x̄1/√
n
∣∣∣∣x̄ ∼ x̄ − µ1/√
n
∣∣∣∣µ (both ∼ N(0, 1))
F The ξ is in essence the same as a bootstrap estimator!
— CD is an extension of bootstrap distribution, albeit the CD concept ismuch broader!
[cf., Xie & Singh 2013]
Fisher’s fiducial distribution & our interpretation
Model/Structure equation: Normal sample X ∼ N(θ, 1) (for simplicity, let obs.# n = 1)
X = θ + U where U ∼ N(0, 1) (1)
Fiducial argument
Equivalent equation (“Inversion”):
θ = X − U
Thus, when we observe X = x ,
θ = x − U (2)
Since U ∼ N(0, 1), so θ ∼ N(x , 1)⇒ The fiducial distribution of θ is N(x , 1)!
“Hidden subjectivity” (Dempster, 1963; Martin & Liu, 2013) — “Continue toregard U as a random sample" from N(0, 1), even “after X = x is observed.”
In particular, U|X = x 6∼ U, by equation (1).
(X and U are completely dependent — given one, the other one is alsocompletely given.)
Fisher’s fiducial distribution & our interpretation
A new prospective (my understanding/interpretation):
In fact, equation (2) for normal sample X̄ ∼ N(µ, 1/n) is:
θ = x̄ − u (2a)
Whence X̄ = x̄ is realized (and observed), a corresponding error U = uis also realized (but unobserved)
Goal: Make inference for θ
What we know: (1) x̄ is observed;(2) unknown u is a realization from U ∼ N(0, 1/n).
An intuitive (appealing) solution:
– Simulate a u∗ ∼ N(0, 1/n) and use u∗ to estimate u.– Plugging it into (2a), we get an estimate of θ (“random estimate”):
θ∗ = x̄ − u∗ (2b)
– Repeating many times, θ∗ forms a fiducial/CD function N(x̄ , 1/n)!(θ∗ is called a fiducial sample and is also a CD-random variable)
Generalized version of fiducial & our interpretation
General model/structure equation:
X = G(θ,U)
where an unknown random “error term” U ∼ Dist .
Realizationx = G(θ,u)
with observed x, unobserved realization u (∼ Dist .) and unknown θ.
Our take — a fiducial procedure is essentially to solve a structureequation for a “random estimate” θ∗ of θ!
x = G(θ∗,u∗)
for an independent random variable u∗ ∼ Dist .(Fiducial “inversion”; incorporated knowledge of Dist .)
How about a Bayesian method?
General model/structure equation:
X = G(θ,U)
where an unknown random “error term” U ∼ Dist .
Realizationx = G(θ,u)
with observed x, unobserved realization u (∼ Dist .) and unknown θ.
Goal: Make inference for θ
What we know: (1) x is observed;(2) unknown u is a realization from u ∼ Dist .(3) unknown θ is a realization from a given prior
θ ∼ π(θ);
How about a Bayesian method?
A Bayesian solution - Approximate Bayesian Computation (ABC) method
[Step A] Simulate a θ∗ ∼ π(θ), a u∗ ∼ Dist and compute
x∗ = G(θ∗,u∗)
[Step B] If x∗ “matches” the observed x, i.e., x∗ ≈ x , keep thesimulated θ∗; Otherwise, repeat Step A.
Effectively, the kept θ∗ solves equation
x ≈ G(θ∗,u∗)
(A Bayesian way of “inversion”; incorporated both knowledge of π(θ) & Dist .)
Repeat above steps many time to get many θ∗; These θ∗ form a“distribution estimator” fa(θ|x).
Theorem: fa(θ|x) is the posterior or an approximation of the posterior!
More remarks on ABC
It is impossible (very difficult) to have perfect matches – So real ABC methods:
1. Allow a small matching error ε (ε→ 0 in theory)
2. Match a “summary” statistics t(x) instead of the original x(related/corresponding to a pivotal quantity!)
When t(x) is a sufficient statistic –
Theorem: The approximate posterior fa(θ|x) converges to the posterior,as ε→ 0.
What happens, when t(x) is not a sufficient statistic?
The approximate posterior fa(θ|x) does NOT converge to the posterior,even as ε→ 0.
Theorem: Under mild conditions, the approximate posterior fa(θ|x)
converges to a confidence distribution, as ε→ 0.
More remarks on ABC
Cauchy example: A sample of size n = 50 from Cauchy(10,1); flat prior
True Cauchy posterior (black curve)
ABC “posterior”, when t(x) ≡ sample median (red curves)
ABC “posterior”, when t(x) ≡ sample mean (blue curves)
Cauchy posterior with flat prior (n = 50)
Both red and blue curves are CDs and they can provide us correct (butnot efficient) statistical inference!
Summary: Why confidence distribution (& related inference)?
CD is a powerful all-purpose inference tool ... informative, flexible,effective, versatile, etc.
A “new” concept and potential platform for unifying(connecting/comparing) inferences from BFF paradigms
Supports new methodology developments beyond conventionalapproaches
o New prediction approaches (Shen et al., 2016+)o New testing methods (Singh et al. 2007, XIe & Singh 2013,
Liu et al. (in preparation))o New simulation schemes (Claggett et al. 2014)o Combining information from diverse sources
(fusion learning/meta analysis, split & conquer, etc.) (Xie etal. 2011, Xie et al, 2013, Liu et al. 2014, 2015, Yang et al.2014)
CD as a catalyst to bridge gaps b|t different statistical paradigms –
Statistical DreamBayesian, Fiducial and Frequestist =BFF= Best Friends Forever
Emanuel Parzen (2013): • ``Confidence distribu=ons deserve to be widely taught and prac=ced as a powerful tool applicable by all researchers concerned with sta=s=cal inference and data discovery.”
• “Today the ques=on should not be about credit for methods, but a framework for tools which are simple and powerful for applica=ons.”