Top Banner
Statistics Lab Rodolfo Metulini IMT Institute for Advanced Studies, Lucca, Italy Lesson 5 - Introduction to Bootstrap (and hints on Markov Chains) - 27.01.2015
24

Talk 5

Jul 15, 2015

Download

Education

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Talk 5

Statistics Lab

Rodolfo Metulini

IMT Institute for Advanced Studies, Lucca, Italy

Lesson 5 - Introduction to Bootstrap (and hints on MarkovChains) - 27.01.2015

Page 2: Talk 5

Introduction

Let’s assume, for a moment, the Central Limit Theorem(CLT):

If a random sample of n observations y1, y2, ..., yn is drawn from apopulation of mean µ and sd σ2, for n enough large, the sampledistribution of the sample mean can be approximated by a normaldensity with mean µ and variance σ2

n

I Averages taken from any distribution will have a normaldistribution

I The standard deviation decreases as the number ofobservation increases

But .. nobody tells us exactly how big the sample has to be.

Page 3: Talk 5

Why Bootstrap?

1. Sometimes we cannot take advantages of the CLT, because:

Nobody tells us exactly how big the sample has to be.Empirically, in some cases the sample is really small.

So, we are not encouraged to conjecture any distributionassumption. We just have the data and we let the raw dataspeak.

The bootstrap method attempts to determine the probabilitydistribution from the data itself, without recourse to CLT.

2. To better estimate the variance of a parameter, andconsequently having more accurate confidence intervals andhypothesis testing.

Page 4: Talk 5

Basic Idea of Bootstrap

To use the original sample as the population, and to draw Msamples from the original sample (the bootstrap samples). ToDefine the estimator using the bootstrap samples.

Figure: Real World versus Bootstrap World

Page 5: Talk 5

Structure of Bootstrap

1. Originally, from a list of data (the sample), one computes astatistic (an estimation).

2. Then, he/she can creates an artificial list of data (a newsample), by randomly drawing elements from the list.

3. He/she computes a new statistic (estimation), from the newsample.

4. He/she repeats, let’s say, M = 1000 times the point 2) and 3)and he/she looks to the distribution of these 1000 statistics.

Page 6: Talk 5

Type of resampling methods

1. The Monte Carlo algorithm: with replacement, the size of thebootstrap sample must be equal to the size of the original data set

2. Jackknife algorithm: we simply re sample from the original sampledeleting one value at a time, the size is equal to n - 1.

Page 7: Talk 5

Estimation of the sample mean

Suppose we extracted a sample x = (x1, x2, ..., xn) from thepopulation X . Let’s say the sample size is small: n = 10.

We can compute the sample mean X̂n using the values of thesample x . But, since n is small, the CLT does not hold, so that wecan say anything about the sample mean distribution.

APPROACH: We extract M samples (or sub-samples) of dimensionn from the sample x (with replacement, MC).

We can define the bootstrap sample means: x̂i ,b, ∀i = 1...,M. Thisbecome the new sample with dimension M.

Bootstrap sample mean:

Mb(X ) =∑M

i x̂i ,b/M

Bootstrap sample variance:

Vb(X ) =∑M

i (x̂i ,b −Mb(X ))2/M − 1 –(Chunk 1)

Page 8: Talk 5

Bootstrap Confidence interval with varianceestimation

Let’s take a random sample of size n= 25 from a normaldistribution with mean 10 and standard deviation 3.

We can consider the sampling distribution of the sample mean.From that, we estimate the intervals.

The bootstrap estimates standard error by re sampling the data inour original sample.

Instead of repeatedly drawing samples of size n= 25 from thepopulation, we will repeatedly draw new samples of size n=25 fromour original sample, re sampling with replacement.

We can estimate the standard error of the sample mean using thestandard deviation of the bootstrapped sample means. –(Chunk2)

Page 9: Talk 5

Bootstrap confidence intervals: formula

Figure: Confidence interval in the bootstrap world

Page 10: Talk 5

Confidence interval with quantiles

Suppose we have a sample of data from an exponential distributionwith parameter λ:

f (x |λ) = λe−λx (remember: the estimation of λ isλ̂ = 1/x̂n).

An alternative solution to the use of bootstrap estimated standarderrors (since the estimation of the standard errors from anexponential is not straightforward) is the use of bootstrapquantiles.

We can obtain M bootstrap estimates λ̂b and define q∗(α) the αquantile of the bootstrap distribution of the M λ estimates.

The new bootstrap confidence interval for λ will be:

[2 ∗ λ̂− q∗(1− α/2); 2 ∗ λ̂− q∗(α/2)] –(Chunk 3)

Page 11: Talk 5

Regression model coefficient estimate with Bootstrap

Now we will consider the situation where we have data on two variables.This is the type of data that arises in linear regression models. It doesnot make sense to bootstrap the two variables separately, so they remainlinked when bootstrapped.

If our original n=4 sample contains the observations (y1=1,x1=3),(y2=2,x2=6), (y3=4,x3=3), and (y4=6,x4=2), we re-sample theseoriginal couples in pairs.

Recall that the linear regression model is: yi = β1 + β2xi + εi . We aregoing to construct a bootstrap interval for the slope coefficient β2:

1. We draw M bootstrap bivariate samples.

2. We define the OLS β̂2 coefficient for each bootstrap sample.

3. We define the bootstrap quantiles, and we use the 0.025 (α/2) andthe 0.975 (1− α/2) to define the confidence interval for β̂2.

–(Chunk 4)

Page 12: Talk 5

Regression model coefficient estimate with Bootstrap(alternative): sampling the residuals

An alternative solution for bootstrap estimating the regressioncoefficient is a two stage methods in which:

1. You draw M samples. For each one you run a regression andyou define M bootstrap residual vectors (M vectors ofdimension n).

2. You add those residuals to each of the M dependent variable’svector.

3. You perform M new regression models using the newdependent variables, to estimate M bootstrapped β2.

The method consists in using the (α/2) and the (1 - α/2)quantiles of bootstrapped β2 to define the confidence interval.–(Chunk 5)

Page 13: Talk 5

References

Efron, B., Tibshirani, R. (1993). An introduction to thebootstrap (Vol. 57). CRC press

Figure: Efron and Tbishirani foundational book

Page 14: Talk 5

Routines in R

1. boot, by Brian Ripley.

Functions and datasets for bootstrapping from the bookBootstrap Methods and Their Applications by A. C. Davisonand D. V. Hinkley (1997, CUP).

2. bootstrap, by Rob Tibshirani.

Software (bootstrap, cross-validation, jackknife) and data forthe book An Introduction to the Bootstrap by B. Efron andR. Tibshirani, 1993, Chapman and Hall

Page 15: Talk 5

Markov Chain

Markov Chain is an important method in probability and manyother area of research.

They are used to model the probability to belong to a certain statein a certain period, given that the state in the past period isknown.

Example of weather: What is the markov probability for the statetomorrow will be sunny, given that today is rainy?

The main properties of Markov Chain processes are:

I Memory of the process (usually the memory is fixed to 1).

I Stationarity of the distribution.

Page 16: Talk 5

Chart 1

A picture of an easy example of markov chain with two possiblestates and reported transition probabilities.

Figure: An example of 2 states markov chain

Page 17: Talk 5

Notation

We define a stochastic process {Xt , t = 0, 1, 2, ...} that takes on afinite or countable number of possible values.

Let the possible values be non negative integers (i .e.Xt ∈ Z+). IfXt = i , then the process is said to be in state i at time t.

The Markov process (in discrete time) is defined as follows:

Pij = P[Xt+1 = j |Xt = i ,Xt−1 = i , ...,X0 = i ] = P[Xt+1 = j |Xt =i ], ∀i , j ∈ Z+

We call Pij a 1-step transition probability because we move fromtime t to time t + 1.

It is a first order Markov Chain (memory = 1) because theprobability of being in state j at time (t + 1) only depends on thestate at time t.

Page 18: Talk 5

Notation - 2

The t − step transition probabilityPtij = P[Xt+k = j |Xk = i ],∀t ≥ 0, i , j ≥ 0

The Champman Kolmogorov equations allow us to compute theset − step transition probabilities. It states that:

Ptij =∑

k PtikPmkj ,∀t,m ≥ 0, ∀i , j ≥ 0

N.B. Base probability properties:

1. Pij ≥ 0, ∀i , j ≥ 0

2.∑

j≥0 Pij = 1, i = 0, 1, 2, ...

Page 19: Talk 5

Example: conditional probability

Consider two states: 0 = rain and 1 = no rain.

Define two probabilities:

α = P00 = P[Xt+1 = 0|Xt = 0] the probability it will raintomorrow given it rains today

β = P01 = P[Xt+1 = 1|Xt = 0] the probability it will raintomorrow given it does not rain today. What is the probability itwill rain the day after tomorrow given it rains today, given α = 0.7and β = 0.3?

The transition probability matrix will be:

P = [P00,P01,P10,P11], or

P = [α = 0.7, β = 0.3, 1−α = 0.4, 1− β = 0.6] –(Chunk 6)

Page 20: Talk 5

Example: unconditional probababily

What is the unconditional probability it will rain the day aftertomorrow?

We need to define the unconditional or marginal distribution of thestate at time t:

P[Xt = j ] =∑

i P[Xt = j |X0 = 1]P[X0 = i ] =∑

i Ptij ∗ αi ,

where αi = P[X0 = i ],∀i ≥ 0

and P[Xt = j |X0 = 1] is the conditional probability just computedbefore. –(Chunk 7)

Page 21: Talk 5

Stationary distributions

A stationary distribution π is the probability distribution such thatwhen the Markov chain reaches the stationary distribution, then itremains in that probability forever.

It means we are asking this question: What is the probability to bein a particular state in the long-run?

Let’s define πj as the limiting probability that the process will be instate j at time t, or

πj = limt→∞Pnij

Using Fubini’s theorem(https://www.youtube.com/watch?v=6-sGhUeOOk8), we candefine the stationary distribution as:

πj =∑

i Pijπi , or, better, with these approximations: π0 = βα ;

π1 = 1−αα

Page 22: Talk 5

Example: stationary distribution

Back to our example.

We can compute the 2 step, 3 step, ..., n- step transitiondistributions, and give a look WHEN it reach theconvergence.

An alternative method to compute the stationary transitiondistribution consists in using this easy formula:

π0 = βα

π1 = 1−αα

Page 23: Talk 5

References

Ross, S. M. (2006). Introduction to probability models. AccessOnline via Elsevier.

Figure: Cover of the 10th edition

Page 24: Talk 5

Routines in R

I markovchain, by Giorgio Alfredo Spedicato.

A package for easily handling discrete Markov chains.

I MCMCpack, by Andrew D. Martin, Kevin M. Quinn, andJong Hee Park.

Perform Monte Carlo simulations based on Markov Chainapproach.