Top Banner
Confidence Distribution (CD) & Bayesian/Frequentist/Fiducial (BFF) inferences Min-ge Xie Department of Statistics & Biostatistics, Rutgers University Celebrating 30 Years of UIUC Statistics November 20, 2015 Research supported in part by grants from NSF
25

@let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Jun 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Confidence Distribution (CD) &Bayesian/Frequentist/Fiducial (BFF) inferences

Min-ge Xie

Department of Statistics & Biostatistics,Rutgers University

Celebrating 30 Years of UIUC StatisticsNovember 20, 2015

Research supported in part by grants from NSF

Page 2: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Parameter estimation —Point estimator, confidence interval & confidence distribution

In statistics, we routinely estimate unknown parameters using

Point estimator

“Interval estimator”: confidence Interval (CI)

A very simple question is

— Can we also use a distribution function (“distribution estimator”)to estimate an unknown parameter of interest in frequentistinference in the style of a Bayesian posterior?

The answer is YES and one such estimator is -

Confidence distribution (CD)

Page 3: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Example: X1, . . . ,Xn i.i.d. follows N(µ,1)

Point estimate: x̄n = 1n∑n

i=1 xi

Interval estimate: (x̄n − 1.96/√

n, x̄n + 1.96/√

n)

Distribution estimate: N(x̄n,1n )

I The idea of the CD approach is to use a sample-dependentdistribution (or density) function to estimate the parameter ofinterest.

Page 4: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Example: Many ways to obtain "distribution estimate" N(x̄n,1n )

Method 1 & 2: Bayes (omit here) & fiducial (later ...)Method 3: P-value method

One-sided test: K0 : µ = µ0 vs Ka : µ > µ0

p(µ0) = P(X̄ > x̄n) = 1− Φ(√

n{x̄n − µ0}) = Φ(√

n{µ0 − x̄n}).

Varying µ0 ∈ Θ! =⇒ Cumulative distribution function of N(x̄n,1n )!

Method 4: Normalizing likelihood function

Likelihood function

L(µ|data) =∏

f (xi |µ) = Ce− 12∑

(xi−µ)2= Ce− n

2 (x̄n−µ)2− 12∑

(xi−x̄n)2

Normalized with respect to µ

L(µ|data)∫L(µ|data)dµ

= ... =1√

2π/ne− n

2 (µ−x̄n)2

It is the density of N(x̄n,1n )!

Method 5: . . . . . .

Page 5: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Inference using CD: Point estimators, Intervals, p-values & more

b  

Page 6: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Long history

Long history: uncertainty measure on parameter space without any prior

Fraser (2011) suggested the seed idea (alas, fiducial idea) can betraced back to Bayes (1763) and Fisher (1922).

The term “confidence distribution” appeared as early as 1937 in a letterfrom E.S. Pearson to W.S. Gosset (Student)

The first use of the term in a formal publication is Cox (1958)

No precise yet general definition for a CD in the classical literature

Most commonly used approach is via inverting the upper limits of awhole set of lower-side confidence intervals, often using some specialexamples (e.g., Efron, 1993, Cox, 2006)

Page 7: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Renewed interest on confidence distribution

Recent developments

Entirely within the frequentist school — Not attempt to derive a new“paradox free” fiducial theory (not possible!)

Focus on providing frequentist inference tools for problems whosesolutions are previously unavailable or unknown.

Page 8: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Renewed interest on confidence distribution

Efron (1998, Statist. Sci.): Bootstrap distributions are“distribution estimators” and “confidence distributions.”

Discussion article on CD (Xie & Singh 2013, Int. Stat. Rev.)

Cox (2013, Int. Stat. Rev. ): The CD approach is aimed “toprovide simple and interpretable summaries of what canreasonably be learned from data (and an assumed model).”Efron (2013, Int. Stat. Rev. ): The CD development is “agrounding process” to help solve “perhaps the most importantunresolved problem in statistical inference” on “the use of Bayestheorem in the absence of prior information.”

Page 9: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Renewed interest in distributional inference

Recent developments of confidence distributions & related topics

Confidence distributions (Schweder & Hjort 2002, 2015; Singh, Xie &Strawderman 2001, 2005, Xie & Singh 2013)

Bayes/objective Bayes (Berger 2006, Berger, Bernardo and Sun(book - forthcoming), etc.)

Dempster-Shafer calculus (Dempster 2008); Inferential Models (Liu& Martin 2009, 2012, 2014)

Generalized inference/generalized fiducial inference (Weerahandi1989, 1993; Hannig 2009; Hannig, Lee & Lai, 2015);

Individualized inference (Liu and Meng, 2015, etc.)

(* above only cited some ‘review’ papers and books)

Current research focus -

� Pay attention to new developments of theoretical frameworks

� Emphasize on applications

Page 10: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Can these all lead to a brighter new future?

Now, with the emerging distributional inference from Bayesian,Frequentist and Fiducial (BFF) fields,

Can we really unify the foundation of statistical inferences?

“Can our desire to find a unification of Bayesian, Frequentist and Fiducial

(BFF) perspectives do the same trick, allowing all of us to thrive under one

roof as BFFs (Best Friends Forever)?” - Meng (2014, IMS Bulletin)

Page 11: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Definition: Confidence Distribution

A one-sentence version -

A confidence distribution (CD) is a sample-dependent distributionfunction that can represent confidence intervals (regions) of all levels fora parameter of interest.

Formally,

Definition

A sample-dependent function on the parameter space(i.e., a function on

Θ×X)

is called a confidence distribution (CD) for parameter θ, if:

R1) For each given sample, it is a distribution function on the parameterspace;

R2) The function can provide confidence intervals (regions) of all levels for θ.

(Notations: Θ = parameter space; X = sample space)Example: N(x̄n,

1n ) on Θ = (−∞,∞).

Page 12: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Analogy examples in point estimation

Point estimation [Point (sample statistic) + Performance]

Consistent estimator/estimate θ̂

R1) It is a point (sample statistic) on the parameter space.R2) It tends to the true θ0, as n→∞.

Unbiased estimator/estimate θ̂

R1) It is a point (sample statistic) on the parameter space.R2) It is unbiased E θ̂ = θ0.

Distribution estimation [(Sample-dependent) dist. function + Performance]

Confidence distribution

R1) It is a (sample-dependent) distribution function on theparameter space.

R2) It ensures the coverage rates of confidence intervals of alllevels.

Other “distribution estimators” — future concepts?

Page 13: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

CD — a unifying concept for distributional inference

Wide range of examples: bootstrap distribution,(normalized) likelihood function, p-value functions, fiducialdistributions, some informative priors and Bayesianposteriors, among others

Our understanding/interpretation: Any approach, regardless of being

frequentist, fiducial or Bayesian, can potentially be unified under the

concept of confidence distributions, as long as it can be used to buildconfidence intervals of all levels, exactly or asymptotically.

o May help bridge some gap between different statistical paradigms...

Page 14: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Three forms of CD presentations

0.0 0.2 0.4 0.6

01

23

4

mu

CD

de

nsity

0.0 0.2 0.4 0.6

0.0

0.4

0.8

mu

CD

0.0 0.2 0.4 0.6

0.0

0.4

0.8

muCV

.

Confidence density: in the form of a densityfunction hn(θ)

e.g., N(x̄n,1n ) as hn(θ) = 1√

2π/ne− n

2 (θ−x̄n)2.

Confidence distribution in the form of acumulative distribution function Hn(θ)

e.g., N(x̄n,1n ) as Hn(θ) = Φ

(√n(θ − x̄n)

)Confidence curve:CVn(θ) = 2 min

{Hn(θ), 1− Hn(θ)

}e.g., N(x̄n,

1n ) as CVn(θ) =

2 min{

Φ(√

n(θ − x̄n)), 1− Φ

(√n(θ − x̄n)

)}

Page 15: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

CD viewed from a different angle – CD-random variable

For each given sample xn, H(·) is a distribution function on Θ

=⇒We can simulate a random variable ξ such that ξ|Xn = xn ∼ H(·)!

We call this ξ a CD-random variable.

– The CD-random variable ξ is viewed as a “random estimator” ofθ0 (a median unbiased estimator)

I Convenient format for connecting with Bootstrap, fiducial and Bayesianmethods

Page 16: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

CD random variable and bootstrap estimator

Normal example earlier: Mean parameter µ is estimated by N(x̄ , 1/n):If we simulate a CD-random variable ξ|data ∼ N(x̄ , 1/n),

ξ − x̄1/√

n

∣∣∣∣x̄ ∼ x̄ − µ1/√

n

∣∣∣∣µ (both ∼ N(0, 1))

The above statement is exactly the same as the key justification forbootstrap, replacing ξ by a bootstrap sample mean x̄∗:

x̄∗ − x̄1/√

n

∣∣∣∣x̄ ∼ x̄ − µ1/√

n

∣∣∣∣µ (both ∼ N(0, 1))

F The ξ is in essence the same as a bootstrap estimator!

CD-random variable ξessentially≡ bootstrap estimator x̄∗

— CD is an extension of bootstrap distribution, albeit the CD concept ismuch broader!

[cf., Xie & Singh 2013]

Page 17: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Fisher’s fiducial distribution & our interpretation

Model/Structure equation: Normal sample X ∼ N(θ, 1) (for simplicity, let obs.# n = 1)

X = θ + U where U ∼ N(0, 1) (1)

Fiducial argument

Equivalent equation (“Inversion”):

θ = X − U

Thus, when we observe X = x ,

θ = x − U (2)

Since U ∼ N(0, 1), so θ ∼ N(x , 1)⇒ The fiducial distribution of θ is N(x , 1)!

“Hidden subjectivity” (Dempster, 1963; Martin & Liu, 2013) — “Continue toregard U as a random sample" from N(0, 1), even “after X = x is observed.”

In particular, U|X = x 6∼ U, by equation (1).

(X and U are completely dependent — given one, the other one is alsocompletely given.)

Page 18: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Fisher’s fiducial distribution & our interpretation

A new prospective (my understanding/interpretation):

In fact, equation (2) for normal sample X̄ ∼ N(µ, 1/n) is:

θ = x̄ − u (2a)

Whence X̄ = x̄ is realized (and observed), a corresponding error U = uis also realized (but unobserved)

Goal: Make inference for θ

What we know: (1) x̄ is observed;(2) unknown u is a realization from U ∼ N(0, 1/n).

An intuitive (appealing) solution:

– Simulate a u∗ ∼ N(0, 1/n) and use u∗ to estimate u.– Plugging it into (2a), we get an estimate of θ (“random estimate”):

θ∗ = x̄ − u∗ (2b)

– Repeating many times, θ∗ forms a fiducial/CD function N(x̄ , 1/n)!(θ∗ is called a fiducial sample and is also a CD-random variable)

Page 19: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Generalized version of fiducial & our interpretation

General model/structure equation:

X = G(θ,U)

where an unknown random “error term” U ∼ Dist .

Realizationx = G(θ,u)

with observed x, unobserved realization u (∼ Dist .) and unknown θ.

Our take — a fiducial procedure is essentially to solve a structureequation for a “random estimate” θ∗ of θ!

x = G(θ∗,u∗)

for an independent random variable u∗ ∼ Dist .(Fiducial “inversion”; incorporated knowledge of Dist .)

Page 20: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

How about a Bayesian method?

General model/structure equation:

X = G(θ,U)

where an unknown random “error term” U ∼ Dist .

Realizationx = G(θ,u)

with observed x, unobserved realization u (∼ Dist .) and unknown θ.

Goal: Make inference for θ

What we know: (1) x is observed;(2) unknown u is a realization from u ∼ Dist .(3) unknown θ is a realization from a given prior

θ ∼ π(θ);

Page 21: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

How about a Bayesian method?

A Bayesian solution - Approximate Bayesian Computation (ABC) method

[Step A] Simulate a θ∗ ∼ π(θ), a u∗ ∼ Dist and compute

x∗ = G(θ∗,u∗)

[Step B] If x∗ “matches” the observed x, i.e., x∗ ≈ x , keep thesimulated θ∗; Otherwise, repeat Step A.

Effectively, the kept θ∗ solves equation

x ≈ G(θ∗,u∗)

(A Bayesian way of “inversion”; incorporated both knowledge of π(θ) & Dist .)

Repeat above steps many time to get many θ∗; These θ∗ form a“distribution estimator” fa(θ|x).

Theorem: fa(θ|x) is the posterior or an approximation of the posterior!

Page 22: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

More remarks on ABC

It is impossible (very difficult) to have perfect matches – So real ABC methods:

1. Allow a small matching error ε (ε→ 0 in theory)

2. Match a “summary” statistics t(x) instead of the original x(related/corresponding to a pivotal quantity!)

When t(x) is a sufficient statistic –

Theorem: The approximate posterior fa(θ|x) converges to the posterior,as ε→ 0.

What happens, when t(x) is not a sufficient statistic?

The approximate posterior fa(θ|x) does NOT converge to the posterior,even as ε→ 0.

Theorem: Under mild conditions, the approximate posterior fa(θ|x)

converges to a confidence distribution, as ε→ 0.

Page 23: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

More remarks on ABC

Cauchy example: A sample of size n = 50 from Cauchy(10,1); flat prior

True Cauchy posterior (black curve)

ABC “posterior”, when t(x) ≡ sample median (red curves)

ABC “posterior”, when t(x) ≡ sample mean (blue curves)

       

Cauchy  posterior  with  flat  prior  (n  =  50)  

Both red and blue curves are CDs and they can provide us correct (butnot efficient) statistical inference!

Page 24: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

Summary: Why confidence distribution (& related inference)?

CD is a powerful all-purpose inference tool ... informative, flexible,effective, versatile, etc.

A “new” concept and potential platform for unifying(connecting/comparing) inferences from BFF paradigms

Supports new methodology developments beyond conventionalapproaches

o New prediction approaches (Shen et al., 2016+)o New testing methods (Singh et al. 2007, XIe & Singh 2013,

Liu et al. (in preparation))o New simulation schemes (Claggett et al. 2014)o Combining information from diverse sources

(fusion learning/meta analysis, split & conquer, etc.) (Xie etal. 2011, Xie et al, 2013, Liu et al. 2014, 2015, Yang et al.2014)

Page 25: @let@token Confidence Distribution (CD) & Bayesian ...stat.rutgers.edu/home/mxie/RCPapers/CD-UIUC15.pdfo New prediction approaches (Shen et al., 2016+) o New testing methods (Singh

CD as a catalyst to bridge gaps b|t different statistical paradigms –

Statistical DreamBayesian, Fiducial and Frequestist =BFF= Best Friends Forever

Emanuel  Parzen    (2013):  •  ``Confidence  distribu=ons  deserve  to  be  widely  taught  and  prac=ced  as  a  powerful  tool  applicable  by  all  researchers  concerned  with  sta=s=cal  inference  and  data  discovery.”  

•  “Today  the  ques=on  should  not        be  about  credit  for  methods,      but  a  framework  for  tools      which  are  simple  and      powerful  for  applica=ons.”