Top Banner
Bayesian and Frequentist Issues in Modern Inference Bradley Efron Stanford University
32

Bayesian and Frequentist Issues in Modern Inference

Sep 12, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bayesian and Frequentist Issues in Modern Inference

Bayesian and Frequentist Issues

in Modern Inference

Bradley Efron

Stanford University

Page 2: Bayesian and Frequentist Issues in Modern Inference

Small Data

Classical statistics Direct tests and estimates of individual

parameters within well-defined models (MLE, Neyman–Pearson)

Not much:

I data-based model selection

I Bayesian combination of related problems

Today Methodology (not Philosophy)

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 2 / 32

Page 3: Bayesian and Frequentist Issues in Modern Inference

Bayesian Inference

Parameter: µ ∈ Ω

Observed data: x

Prior: g(µ)

Probability distributions:fµ(x), µ ∈ Ω

Parameter of interest: θ = t(µ)

Eθ|x =

∫Ω

t(µ)fµ(x)g(µ) dµ/∫

Ω

fµ(x)g(µ) dµ

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 3 / 32

Page 4: Bayesian and Frequentist Issues in Modern Inference

Jeffreysonian Bayes Inference“Uninformative Priors”

What if we don’t know prior g?

Jeffreys: g(µ) =∣∣∣I(µ)

∣∣∣1/2 where I(µ) = cov∇µ log fµ(x)

(the Fisher information matrix)

Can still use Bayes theorem but how accurate are the estimates?

Frequentist variability of Et(µ)|x

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 4 / 32

Page 5: Bayesian and Frequentist Issues in Modern Inference

General Accuracy Formula

µ and x ∈ Rp

Vµ = covµ(x)

αx(µ) = ∇x log fµ(x) =(. . . ,

∂ log fµ(x)

∂xi, . . .

)T

LemmaE = E

t(µ)|x

has gradient ∇xE = cov

t(µ), αx(µ)|x

.

TheoremThe delta-method standard deviation of E is

sd(E) =[cov

t(µ), αx(µ)|x

T Vx covt(µ), αx(µ)|x

]1/2.

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 5 / 32

Page 6: Bayesian and Frequentist Issues in Modern Inference

Implementation

Posterior sample from µ|x

µ1, µ2, . . . , µB (MCMC?)

Each µi gives ti = t(µi) and

αi = αx(µi)

E =∑

ti/B Et(µ)|x

cov =∑B

i=1 (αi − α)(ti − t

) /B

sd =[covT Vx cov

]1/2

No additional sampling for cov

−4 −2 0 2

02

46

810

12

(mu[i], alpha[i], t[i])

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 6 / 32

Page 7: Bayesian and Frequentist Issues in Modern Inference

Diabetes DataEfron et al. (2004), “LARS”

n = 442 subjects

p = 10 predictors: age, sex, bmi, glu,. . .

Response: y = disease progression at one year

Model: yn×1

= Xn×p

βp×1

+ en×1

[e ∼ Nn(0, I)]

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 7 / 32

Page 8: Bayesian and Frequentist Issues in Modern Inference

Bayesian LassoPark and Casella (2008)

Model: y ∼ Nn(Xβ, I)

Prior: g(β) = e−γL1(β) [γ = 0.37]

Then posterior mode at Lasso βγ

Subject 125: θ125 = xT125β

How accurate are Bayes posterior inferences for θ125?

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 8 / 32

Page 9: Bayesian and Frequentist Issues in Modern Inference

Bayesian Analysis

MCMC: posterior sampleβi for i = 1,2, . . . ,10,000

Gives

θ125,i = xT

125βi , i = 1,2, . . . ,10,000

θ125,i ∼ 0.248 ± 0.072

General accuracy formula frequentist sd 0.071 for E = 0.248[αx(µ) = XT Xβ

]

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 9 / 32

Page 10: Bayesian and Frequentist Issues in Modern Inference

Posterior CDF for Subject 125

cdfy(c) = Prθ125 ≤ c |y

si =

1 as ti ≤ c

0 as ti > c

cdfy(c) =∑B

1 si/B

For c = 0.3:

cdfy(c)= 0.762 ± 0.304

Bayes frequentistestimate sd

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 10 / 32

Page 11: Bayesian and Frequentist Issues in Modern Inference

0.1 0.2 0.3 0.4

0.0

0.2

0.4

0.6

0.8

1.0

Posterior cdf for mu125, Diabetes data, 10000 MCMC draws,Prior exp−.37*L1(beta); verts are +− One Frequentist Standard Dev

Upper 95% credible limit is .342 +− .069c value

Pro

bm

u125

< c

| da

ta

][

.342

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 11 / 32

Page 12: Bayesian and Frequentist Issues in Modern Inference

Exponential Families

fα(β)

= eαT β−ψ(α)f0

(β), with α, β in Rp

Natural parameter α, sufficient statistic β, expectation β = Eαβ[

Poisson : fµ(x) = e−µµx/x! : x = β, µ = β, α = log(µ)]

General accuracy formula For E = Et(β)|β

,

sd(E) =cov

(t , α

∣∣∣β)TVα cov

(t , α

∣∣∣β)1/2

with Vα = covα=α

(β).

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 12 / 32

Page 13: Bayesian and Frequentist Issues in Modern Inference

Better Frequentist Inferences

For E = Et(β)

∣∣∣β in exfam fα(β)

= eαT β−ψ(α)f0

(β)

Parametric bootstrap fα(·)→[β∗1, β

2, . . . , β∗

j , . . . , β∗

J

]−→

[· · ·E∗j = E

t(β)

∣∣∣∣β∗j · · · ] −→ bootstrap conf int for E

Trouble Need new MCMC sample for each β∗j

Shortcut Reweight original MCMC sample (importance

sampling)

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 13 / 32

Page 14: Bayesian and Frequentist Issues in Modern Inference

Digression: Posterior Exponential Family

fα(β)

= eαT β−ψ(α)f0

(β): natural param α, suff stat β, “carrier” f0

(β)

Posterior exponential family

g(α|suff stat b) = e(b−β)Tα−φ(b)g

(α∣∣∣β)

natural param b, suff stat α, carrier g(α∣∣∣β)

Importance sampling Reweight the original MCMC realizations:

Eti |b =

B∑i=1

tiWi(b)

/ B∑i=1

Wi(b)[Wi(b) = e(b−β)

Tαi

]

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 14 / 32

Page 15: Bayesian and Frequentist Issues in Modern Inference

Bootstrap Intervals Without BootstrappingDiCiccio and Efron (1992)

“abc” Investigate Et |b for b near β

Requires p + 2 numerical 2nd derivatives of E function

Next: Applied to posterior cdf for mu125

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 15 / 32

Page 16: Bayesian and Frequentist Issues in Modern Inference

0.1 0.2 0.3 0.4

0.0

0.2

0.4

0.6

0.8

1.0

Heavy curve is posterior cdf for mu125, diabetes data.Vertical bars frequentist central 68% abc confidence intervals.

(Light lines show +− one frequentist standard error)

c value

Pro

bm

u125

< c

| da

ta

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 16 / 32

Page 17: Bayesian and Frequentist Issues in Modern Inference

Estimation After Model Selection

Usually:

(a) look at data

(b) choose model (linear, quad, cubic . . . ?)

(c) fit estimates using chosen model

(d) analyze as if pre-chosen

Today Include model selection process in the analysis

Question Effects on standard errors, confidence intervals, etc.?

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 17 / 32

Page 18: Bayesian and Frequentist Issues in Modern Inference

Cholesterol Data

n = 164 men took Cholestyramine for ∼ 7 years

x = compliance measure (adjusted: x ∼ N(0,1))

y = cholesterol decrease

Wish to estimate regression values

µj = Ey |x = xj for j = 1,2, . . . ,164

µ = (µ1, µ2, . . . , µ164)T

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 18 / 32

Page 19: Bayesian and Frequentist Issues in Modern Inference

−2 −1 0 1 2

050

100

Cholesterol data, n=164 subjects: cholesterol decrease plottedversus adjusted compliance; Green curve is OLS cubic regression;

Red points indicate 5 featured subjects

compliance

chol

este

rol d

ecre

ase

1

2

3

4

5

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 19 / 32

Page 20: Bayesian and Frequentist Issues in Modern Inference

Cp Selection Criterion

Regression model yn×1

= Xn×m

βm×1

+ en×1

[ei ∼ (0, σ2)

]Cp criterion

∥∥∥y − X β∥∥∥2

+ 2mσ2

β = OLS estimate, m = “degrees of freedom”

Model selection From possible models X1,X2,X3, . . .

choose the one minimizing Cp.

Then use OLS estimate from chosen model.

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 20 / 32

Page 21: Bayesian and Frequentist Issues in Modern Inference

Cp for Cholesterol Data

Model df Cp − 80000 (Boot %)

M1 (linear) 2 1132 (19%)

M2 (quad) 3 1412 (12%)

M3 (cubic) 4 667 (34%)

M4 (quartic) 5 1591 (8%)

M5 (quintic) 6 1811 (21%)

M6 (sextic) 7 2758 (6%)

(σ = 22 from “full model”M6)

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 21 / 32

Page 22: Bayesian and Frequentist Issues in Modern Inference

Nonparametric Bootstrap Analysis

data = (xi , yi), i = 1,2, . . . ,n = 164 gave original estimate

µ = X3β3

Bootstrap data set: data∗ =(xj , yj)

∗, j = 1,2, . . . ,n

where

(xj , yj)∗ drawn randomly and with replacement from data:

data∗ −→Cp

m∗ −→OLS

β∗m∗ −→ µ∗ = Xm∗ β∗

m∗

I did this all B = 4000 times.

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 22 / 32

Page 23: Bayesian and Frequentist Issues in Modern Inference

B=4000 nonparametric bootstrap replications for the model−selectedregression estimate of Subject 1; boot (m,stdev)=(−2.63,8.02);

76% of the replications less than original estimate 2.71

Red triangles are 2.5th and 97.5th boot percentilesbootstrap estimates for subject 1

Fre

quen

cy

−30 −20 −10 0 10 20

050

100

150

200

250

^ ^2.71

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 23 / 32

Page 24: Bayesian and Frequentist Issues in Modern Inference

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

Smooth Estimation Model: ' y ' is observed data; Ellipses indicate bootstrap distribution for ' y* ';

Red curves level surfaces of equal estimation for thetahat=t(y)

thetahat=t(y)

y

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 24 / 32

Page 25: Bayesian and Frequentist Issues in Modern Inference

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 25 / 32

Page 26: Bayesian and Frequentist Issues in Modern Inference

1 2 3 4 5 6

−40

−30

−20

−10

010

2030

Boxplot of Cp boot estimates for Subject 1; B=4000 bootreps;Red bars indicate selection proportions for Models 1−6

only 1/3 of the bootstrap replications chose Model 3selected model

subj

ect 1

est

imat

es

Model3

*********

MODEL 3

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 26 / 32

Page 27: Bayesian and Frequentist Issues in Modern Inference

Bootstrap Smoothing

Idea Replace original estimator t(y) with bootstrap average

s(y) =

B∑i=1

t(y∗i

) /B

Model averaging

Same as bagging (“bootstrap aggregation,” Breiman)

Removes discontinuities, reduces variance

Approximate confidence interval: s(y) ± 1.96 · sd

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 27 / 32

Page 28: Bayesian and Frequentist Issues in Modern Inference

Accuracy Theorem

Notation s0 = s(y), t ∗i = t(y∗i ), i = 1,2, . . .B

Y ∗ij = # of times jth data point appears in ith boot sample

covj =∑B

i=1 Y ∗ij ·(t ∗i − s0

) /B

[covariance Y ∗ij with t ∗i

]TheoremThe delta method standard deviation estimate for s0 is

sd =

n∑j=1

cov2j

1/2

,

always ≤

B∑i=1

(t ∗i − s0

)2 /B

1/2

, the boot stdev for t(y).

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 28 / 32

Page 29: Bayesian and Frequentist Issues in Modern Inference

Projection Interpretation

0

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 29 / 32

Page 30: Bayesian and Frequentist Issues in Modern Inference

1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Standard Deviation of smoothed estimate relative to original (Red)for five subjects; green line is stdev Naive Cubic Model

bottom numbers show original standard deviationsSubject number

Rel

ativ

e st

dev

7.9 3.9 4.1 4.7 6.8

*

* *

*

*

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 30 / 32

Page 31: Bayesian and Frequentist Issues in Modern Inference

Model Probability Estimates

34% of the 4000 bootreps chose the cubic model

Poor man’s Bayes posterior prob for “cubic”

How accurate is that 34%?

Apply accuracy theorem to indicator function for choosing “cubic”

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 31 / 32

Page 32: Bayesian and Frequentist Issues in Modern Inference

Model Boot % ± Standard Error

M1 (linear) 19% ±24

M2 (quad) 12% ±18

M3 (cubic) 34% ±24

M4 (quartic) 8% ±14

M5 (quintic) 21% ±27

M6 (sextic) 6% ±6

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 32 / 32