@let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Confidence Distribution (CD) &Bayesian/Frequentist/Fiducial (BFF) inferences

Min-ge Xie

Department of Statistics & Biostatistics,Rutgers University

Celebrating 30 Years of UIUC StatisticsNovember 20, 2015

Research supported in part by grants from NSF

Parameter estimation —Point estimator, confidence interval & confidence distribution

In statistics, we routinely estimate unknown parameters using

Point estimator

“Interval estimator”: confidence Interval (CI)

A very simple question is

— Can we also use a distribution function (“distribution estimator”)to estimate an unknown parameter of interest in frequentistinference in the style of a Bayesian posterior?

The answer is YES and one such estimator is -

Confidence distribution (CD)

Example: X1, . . . ,Xn i.i.d. follows N(µ,1)

Point estimate: x̄n = 1n∑n

i=1 xi

Interval estimate: (x̄n − 1.96/√

n, x̄n + 1.96/√

n)

Distribution estimate: N(x̄n,1n )

I The idea of the CD approach is to use a sample-dependentdistribution (or density) function to estimate the parameter ofinterest.

Example: Many ways to obtain "distribution estimate" N(x̄n,1n )

Method 1 & 2: Bayes (omit here) & fiducial (later ...)Method 3: P-value method

One-sided test: K0 : µ = µ0 vs Ka : µ > µ0

p(µ0) = P(X̄ > x̄n) = 1− Φ(√

n{x̄n − µ0}) = Φ(√

n{µ0 − x̄n}).

Varying µ0 ∈ Θ! =⇒ Cumulative distribution function of N(x̄n,1n )!

Method 4: Normalizing likelihood function

Likelihood function

L(µ|data) =∏

f (xi |µ) = Ce− 12∑

(xi−µ)2= Ce− n

2 (x̄n−µ)2− 12∑

(xi−x̄n)2

Normalized with respect to µ

L(µ|data)∫L(µ|data)dµ

= ... =1√

2π/ne− n

2 (µ−x̄n)2

It is the density of N(x̄n,1n )!

Method 5: . . . . . .

Inference using CD: Point estimators, Intervals, p-values & more

b

Long history

Long history: uncertainty measure on parameter space without any prior

Fraser (2011) suggested the seed idea (alas, fiducial idea) can betraced back to Bayes (1763) and Fisher (1922).

The term “confidence distribution” appeared as early as 1937 in a letterfrom E.S. Pearson to W.S. Gosset (Student)

The first use of the term in a formal publication is Cox (1958)

No precise yet general definition for a CD in the classical literature

Most commonly used approach is via inverting the upper limits of awhole set of lower-side confidence intervals, often using some specialexamples (e.g., Efron, 1993, Cox, 2006)

Renewed interest on confidence distribution

Recent developments

Entirely within the frequentist school — Not attempt to derive a new“paradox free” fiducial theory (not possible!)

Focus on providing frequentist inference tools for problems whosesolutions are previously unavailable or unknown.

Renewed interest on confidence distribution

Efron (1998, Statist. Sci.): Bootstrap distributions are“distribution estimators” and “confidence distributions.”

Discussion article on CD (Xie & Singh 2013, Int. Stat. Rev.)

Cox (2013, Int. Stat. Rev. ): The CD approach is aimed “toprovide simple and interpretable summaries of what canreasonably be learned from data (and an assumed model).”Efron (2013, Int. Stat. Rev. ): The CD development is “agrounding process” to help solve “perhaps the most importantunresolved problem in statistical inference” on “the use of Bayestheorem in the absence of prior information.”

Renewed interest in distributional inference

Recent developments of confidence distributions & related topics

Confidence distributions (Schweder & Hjort 2002, 2015; Singh, Xie &Strawderman 2001, 2005, Xie & Singh 2013)

Bayes/objective Bayes (Berger 2006, Berger, Bernardo and Sun(book - forthcoming), etc.)

Dempster-Shafer calculus (Dempster 2008); Inferential Models (Liu& Martin 2009, 2012, 2014)

Generalized inference/generalized fiducial inference (Weerahandi1989, 1993; Hannig 2009; Hannig, Lee & Lai, 2015);

Individualized inference (Liu and Meng, 2015, etc.)

(* above only cited some ‘review’ papers and books)

Current research focus -

� Pay attention to new developments of theoretical frameworks

� Emphasize on applications

Can these all lead to a brighter new future?

Now, with the emerging distributional inference from Bayesian,Frequentist and Fiducial (BFF) fields,

Can we really unify the foundation of statistical inferences?

“Can our desire to find a unification of Bayesian, Frequentist and Fiducial

(BFF) perspectives do the same trick, allowing all of us to thrive under one

roof as BFFs (Best Friends Forever)?” - Meng (2014, IMS Bulletin)

Definition: Confidence Distribution

A one-sentence version -

A confidence distribution (CD) is a sample-dependent distributionfunction that can represent confidence intervals (regions) of all levels fora parameter of interest.

Formally,

Definition

A sample-dependent function on the parameter space(i.e., a function on

Θ×X)

is called a confidence distribution (CD) for parameter θ, if:

R1) For each given sample, it is a distribution function on the parameterspace;

R2) The function can provide confidence intervals (regions) of all levels for θ.

(Notations: Θ = parameter space; X = sample space)Example: N(x̄n,

1n ) on Θ = (−∞,∞).

Analogy examples in point estimation

Point estimation [Point (sample statistic) + Performance]

Consistent estimator/estimate θ̂

R1) It is a point (sample statistic) on the parameter space.R2) It tends to the true θ0, as n→∞.

Unbiased estimator/estimate θ̂

R1) It is a point (sample statistic) on the parameter space.R2) It is unbiased E θ̂ = θ0.

Distribution estimation [(Sample-dependent) dist. function + Performance]

Confidence distribution

R1) It is a (sample-dependent) distribution function on theparameter space.

R2) It ensures the coverage rates of confidence intervals of alllevels.

Other “distribution estimators” — future concepts?

CD — a unifying concept for distributional inference

Wide range of examples: bootstrap distribution,(normalized) likelihood function, p-value functions, fiducialdistributions, some informative priors and Bayesianposteriors, among others

Our understanding/interpretation: Any approach, regardless of being

frequentist, fiducial or Bayesian, can potentially be unified under the

concept of confidence distributions, as long as it can be used to buildconfidence intervals of all levels, exactly or asymptotically.

o May help bridge some gap between different statistical paradigms...

Three forms of CD presentations

0.0 0.2 0.4 0.6

01

23

4

mu

CD

de

nsity

0.0 0.2 0.4 0.6

0.0

0.4

0.8

mu

CD

0.0 0.2 0.4 0.6

0.0

0.4

0.8

muCV

.

Confidence density: in the form of a densityfunction hn(θ)

e.g., N(x̄n,1n ) as hn(θ) = 1√

2π/ne− n

2 (θ−x̄n)2.

Confidence distribution in the form of acumulative distribution function Hn(θ)

e.g., N(x̄n,1n ) as Hn(θ) = Φ

(√n(θ − x̄n)

)Confidence curve:CVn(θ) = 2 min

{Hn(θ), 1− Hn(θ)

}e.g., N(x̄n,

1n ) as CVn(θ) =

2 min{

Φ(√

n(θ − x̄n)), 1− Φ

(√n(θ − x̄n)

)}

CD viewed from a different angle – CD-random variable

For each given sample xn, H(·) is a distribution function on Θ

=⇒We can simulate a random variable ξ such that ξ|Xn = xn ∼ H(·)!

We call this ξ a CD-random variable.

– The CD-random variable ξ is viewed as a “random estimator” ofθ0 (a median unbiased estimator)

I Convenient format for connecting with Bootstrap, fiducial and Bayesianmethods

CD random variable and bootstrap estimator

Normal example earlier: Mean parameter µ is estimated by N(x̄ , 1/n):If we simulate a CD-random variable ξ|data ∼ N(x̄ , 1/n),

ξ − x̄1/√

n

∣∣∣∣x̄ ∼ x̄ − µ1/√

n

∣∣∣∣µ (both ∼ N(0, 1))

The above statement is exactly the same as the key justification forbootstrap, replacing ξ by a bootstrap sample mean x̄∗:

x̄∗ − x̄1/√

n

∣∣∣∣x̄ ∼ x̄ − µ1/√

n

∣∣∣∣µ (both ∼ N(0, 1))

F The ξ is in essence the same as a bootstrap estimator!

CD-random variable ξessentially≡ bootstrap estimator x̄∗

— CD is an extension of bootstrap distribution, albeit the CD concept ismuch broader!

[cf., Xie & Singh 2013]

Fisher’s fiducial distribution & our interpretation

Model/Structure equation: Normal sample X ∼ N(θ, 1) (for simplicity, let obs.# n = 1)

X = θ + U where U ∼ N(0, 1) (1)

Fiducial argument

Equivalent equation (“Inversion”):

θ = X − U

Thus, when we observe X = x ,

θ = x − U (2)

Since U ∼ N(0, 1), so θ ∼ N(x , 1)⇒ The fiducial distribution of θ is N(x , 1)!

“Hidden subjectivity” (Dempster, 1963; Martin & Liu, 2013) — “Continue toregard U as a random sample" from N(0, 1), even “after X = x is observed.”

In particular, U|X = x 6∼ U, by equation (1).

(X and U are completely dependent — given one, the other one is alsocompletely given.)

Fisher’s fiducial distribution & our interpretation

A new prospective (my understanding/interpretation):

In fact, equation (2) for normal sample X̄ ∼ N(µ, 1/n) is:

θ = x̄ − u (2a)

Whence X̄ = x̄ is realized (and observed), a corresponding error U = uis also realized (but unobserved)

Goal: Make inference for θ

What we know: (1) x̄ is observed;(2) unknown u is a realization from U ∼ N(0, 1/n).

An intuitive (appealing) solution:

– Simulate a u∗ ∼ N(0, 1/n) and use u∗ to estimate u.– Plugging it into (2a), we get an estimate of θ (“random estimate”):

θ∗ = x̄ − u∗ (2b)

– Repeating many times, θ∗ forms a fiducial/CD function N(x̄ , 1/n)!(θ∗ is called a fiducial sample and is also a CD-random variable)

Generalized version of fiducial & our interpretation

General model/structure equation:

X = G(θ,U)

where an unknown random “error term” U ∼ Dist .

Realizationx = G(θ,u)

with observed x, unobserved realization u (∼ Dist .) and unknown θ.

Our take — a fiducial procedure is essentially to solve a structureequation for a “random estimate” θ∗ of θ!

x = G(θ∗,u∗)

for an independent random variable u∗ ∼ Dist .(Fiducial “inversion”; incorporated knowledge of Dist .)

How about a Bayesian method?

General model/structure equation:

X = G(θ,U)

where an unknown random “error term” U ∼ Dist .

Realizationx = G(θ,u)

with observed x, unobserved realization u (∼ Dist .) and unknown θ.

Goal: Make inference for θ

What we know: (1) x is observed;(2) unknown u is a realization from u ∼ Dist .(3) unknown θ is a realization from a given prior

θ ∼ π(θ);

How about a Bayesian method?

A Bayesian solution - Approximate Bayesian Computation (ABC) method

[Step A] Simulate a θ∗ ∼ π(θ), a u∗ ∼ Dist and compute

x∗ = G(θ∗,u∗)

[Step B] If x∗ “matches” the observed x, i.e., x∗ ≈ x , keep thesimulated θ∗; Otherwise, repeat Step A.

Effectively, the kept θ∗ solves equation

x ≈ G(θ∗,u∗)

(A Bayesian way of “inversion”; incorporated both knowledge of π(θ) & Dist .)

Repeat above steps many time to get many θ∗; These θ∗ form a“distribution estimator” fa(θ|x).

Theorem: fa(θ|x) is the posterior or an approximation of the posterior!

More remarks on ABC

It is impossible (very difficult) to have perfect matches – So real ABC methods:

1. Allow a small matching error ε (ε→ 0 in theory)

2. Match a “summary” statistics t(x) instead of the original x(related/corresponding to a pivotal quantity!)

When t(x) is a sufficient statistic –

Theorem: The approximate posterior fa(θ|x) converges to the posterior,as ε→ 0.

What happens, when t(x) is not a sufficient statistic?

The approximate posterior fa(θ|x) does NOT converge to the posterior,even as ε→ 0.

Theorem: Under mild conditions, the approximate posterior fa(θ|x)

converges to a confidence distribution, as ε→ 0.

More remarks on ABC

Cauchy example: A sample of size n = 50 from Cauchy(10,1); flat prior

True Cauchy posterior (black curve)

ABC “posterior”, when t(x) ≡ sample median (red curves)

ABC “posterior”, when t(x) ≡ sample mean (blue curves)

Cauchy posterior with flat prior (n = 50)

Both red and blue curves are CDs and they can provide us correct (butnot efficient) statistical inference!

Summary: Why confidence distribution (& related inference)?

CD is a powerful all-purpose inference tool ... informative, flexible,effective, versatile, etc.

A “new” concept and potential platform for unifying(connecting/comparing) inferences from BFF paradigms

Supports new methodology developments beyond conventionalapproaches

o New prediction approaches (Shen et al., 2016+)o New testing methods (Singh et al. 2007, XIe & Singh 2013,

Liu et al. (in preparation))o New simulation schemes (Claggett et al. 2014)o Combining information from diverse sources

(fusion learning/meta analysis, split & conquer, etc.) (Xie etal. 2011, Xie et al, 2013, Liu et al. 2014, 2015, Yang et al.2014)

CD as a catalyst to bridge gaps b|t different statistical paradigms –

Statistical DreamBayesian, Fiducial and Frequestist =BFF= Best Friends Forever

Emanuel Parzen (2013): •  ``Confidence distribu=ons deserve to be widely taught and prac=ced as a powerful tool applicable by all researchers concerned with sta=s=cal inference and data discovery.”

•  “Today the ques=on should not be about credit for methods, but a framework for tools which are simple and powerful for applica=ons.”

@let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Documents