Bootstrap Confidence Intervals ST552 Lecture 13 Charlotte Wickham 2019-02-11 1
Bootstrap Confidence Intervals
ST552 Lecture 13
Charlotte Wickham
2019-02-11
1
Motivation
The inferences we’ve covered so far relied on our assumption of
Normal errors:
‘ ≥ N(0, ‡2In◊n)
For example, we’ve seen under this assumption, the least squares
estimates are also Normally distributed:
— ≥ N3
—, ‡21XT X
2≠14
If the errors aren’t truly Normally distributed, what
distribution do the estimates have?
2
Warm-up: Your Turn
Imagine the errors are in fact t3 distributed?
With your neighbour: design a simulation to understand the
distribution of the least squares estimates.
3
[ aGot ichor B
sampling
4
Some model & - ¥ ,① Assure some things about model : y=XpptE
Decide o→pX ad B ,make something up .
n
② Compute f 's 100,000 times
a) Simulate y: simulate E
finely y-- X ft E
b) Using y ,fit model : f- ( * X) - '
XTY
③ MakeA Red :
We don't
¥ LA'
Bo
Example: 1. Fix n, fix X
2
4
6
8
10
5 10 15 20x
Response
5
n = 10
Xnvniffo ,20)
Example: 1. Fix —, find y
0
4
8
12
5 10 15 20x
Response
6
X Ely )
f- f 1,05)T
trueregressionline
, Ely )
Example: 2. Simulate errors, find y
0
4
8
12
5 10 15 20x
Response
7
Fsi .÷÷:D
Example: 3. Find least squares line
0
4
8
12
5 10 15 20x
Response
8
[Fitted
regression line
E. a
Example: 4. Repeat #2. and #3. many times
0
4
8
12
5 10 15 20x
Response
9
IOne fit fromOne
simulation
Example: Examine distribution of estimates
0.0
0.1
0.2
0.3
0.4
−5 0 5 10Estimate of intercept
dens
ity
0
1
2
3
4
5
0.00 0.25 0.50 0.75Estimate of slope
dens
ity
10
fo -- I p,
= o - 5
'siege. .
/ /
1000 Go 1000 § ,
Example: Compared to theory
0.0
0.1
0.2
0.3
0.4
−5 0 5 10Estimate of intercept
dens
ity
0
1
2
3
4
5
0.00 0.25 0.50 0.75Estimate of slope
dens
ity
11
If Eira Nfo ,o
' ) BN Nfp ,
E # D- ' )
E ; I ¥
Based on simulation :
If error are tz distributed I to
Normal,
We assured on n, X
, B .
When the errors aren’t Normal: CLT
Think of our estimates like linear combinations of the errors. I.e. a
sort of average of i.i.d random variables.
Some version of the Central Limit Theorem will apply.
For large samples, even when the errors aren’t Normal,
—≥N(—, ‡2(XT X )
≠1)
12
§ -
- - + we
-
T
approximately
as h - A that approximationwill improve .
Summary so far
If we knew the error distribution and true parameters we could use
simulation to understand the sampling distribution the least
squares estimates.
Simulation can also be used to demonstrate the CLT at work in
regression.
13
Bootstrap confidence intervals
In practice, with data in front of us, we don’t know the distribution
of the errors (nor the true parameter values).
The bootstrap is one approach to estimate the sampling
distribution of —, by using the simulation idea, and substituting in
our best guesses for the things we don’t know.
14
Bootstrapping regression
(Model based resampling)
0. Fit model and find estimates, —, and residuals, ei
1. Fix X ,
2. For k = 1, . . . , B2.1 Generate errors, ‘ú
i sampled with replacement from ei2.2 Construct y , using the model, y = y + ‘ú
2.3 Use least squares to find —ú(k)
3. Examine the distribution of —úand compare to —
One confidence interval for —j is the 2.5% and 97.5% quantiles of
the distribution of —új .
(Known as the Percentile method, there are other (better?)methods).
15
residualsRepeat many
timesy
= x § + E*
(95T
.
Example: Faraway Galapagos Islands
(I’ll illustrate with simple linear regression, Faraway does multiplecase in 3.6)
−200
0
200
400
0 500 1000 1500Elevation
Spec
ies
Observed data
16
-
:Eth Elevates
Bootstrap: 1. Find —, y , and ei .
−200
0
200
400
0 500 1000 1500Elevation
Spec
ies
Observed data
17
Assume E Species ;-- fo Tf, Eteuitei
fo 'RE Levit
e i←
Bootstrap: Using fixed X , ˆbeta from observed data
−200
0
200
400
0 500 1000 1500Elevation
Spec
ies
Observed data
18
he
*
Bootstrap: 2. Resample residuals to construct bootstrapped
response
−200
0
200
400
0 500 1000 1500Elevation
Spec
ies
Bootstrapped data
19
appeala booth- r
yerror EEO,
eco sampledat
random
€ are from previouspage
Bootstrap: 3. Fit regression model to bootstrapped response
−200
0
200
400
0 500 1000 1500Elevation
Spec
ies
Bootstrapped data
20
Bootstrap: 3. Repeat #2. and #3. many times
−200
0
200
400
0 500 1000 1500Elevation
Species
21
Imay regression
lines fit to many
bootstrappeddatasets
Examine distribution of estimates
0.000
0.005
0.010
0.015
0.020
−50 0 50Bootstrap estimates of intercept
dens
ity
0
5
10
0.10 0.15 0.20 0.25 0.30 0.35Bootstrap estimates of slope
dens
ity
22
telling us about2- File sapling dist
↳ / of ei
197 . site
(otI e:*° C- 25,50 ) : 95 's .CI for Po
High level: bootstrap idea
We don’t know the distribution of the errors, but our best guess is
probably the empirical c.d.f on the residuals.
Sampling from a random variable with a c.d.f. defined as the
empirical c.d.f. of the residuals, boils down to sampling with
replacement from residuals.
23
inn " " " i : "
D
Limitations
We might rely on bootstrap confidence intervals when we are
worried about the assumption of Normal errors. But, there are
limitations.
• We still rely on the assumption that the errors are
independent and identically distributed.
• Generally scaled residuals are used (residuals don’t have the
same variance, more later)
• An alternative bootstrap resamples the (yi , xi1, . . . , xip)
vectors, i.e. resamples the rows of the data, a.k.a resamplingcases bootstrap.
24
↳ choice should duped on
Experiment :sresapaEY.akae.de
' 'SJiang say .
.
T
Limitations
We might rely on bootstrap confidence intervals when we are
worried about the assumption of Normal errors. But, there are
limitations.
• We still rely on the assumption that the errors are
independent and identically distributed.
• Generally scaled residuals are used (residuals don’t have the
same variance, more later)
• An alternative bootstrap resamples the (yi , xi1, . . . , xip)
vectors, i.e. resamples the rows of the data, a.k.a resamplingcases bootstrap.
24