Talk 5

Statistics Lab

Rodolfo Metulini

IMT Institute for Advanced Studies, Lucca, Italy

Lesson 5 - Introduction to Bootstrap (and hints on MarkovChains) - 27.01.2015

Introduction

Let’s assume, for a moment, the Central Limit Theorem(CLT):

If a random sample of n observations y1, y2, ..., yn is drawn from apopulation of mean µ and sd σ2, for n enough large, the sampledistribution of the sample mean can be approximated by a normaldensity with mean µ and variance σ2

n

↘

I Averages taken from any distribution will have a normaldistribution

I The standard deviation decreases as the number ofobservation increases

But .. nobody tells us exactly how big the sample has to be.

Why Bootstrap?

1. Sometimes we cannot take advantages of the CLT, because:

Nobody tells us exactly how big the sample has to be.Empirically, in some cases the sample is really small.

So, we are not encouraged to conjecture any distributionassumption. We just have the data and we let the raw dataspeak.

The bootstrap method attempts to determine the probabilitydistribution from the data itself, without recourse to CLT.

2. To better estimate the variance of a parameter, andconsequently having more accurate confidence intervals andhypothesis testing.

Basic Idea of Bootstrap

To use the original sample as the population, and to draw Msamples from the original sample (the bootstrap samples). ToDefine the estimator using the bootstrap samples.

Figure: Real World versus Bootstrap World

Structure of Bootstrap

1. Originally, from a list of data (the sample), one computes astatistic (an estimation).

2. Then, he/she can creates an artificial list of data (a newsample), by randomly drawing elements from the list.

3. He/she computes a new statistic (estimation), from the newsample.

4. He/she repeats, let’s say, M = 1000 times the point 2) and 3)and he/she looks to the distribution of these 1000 statistics.

Type of resampling methods

1. The Monte Carlo algorithm: with replacement, the size of thebootstrap sample must be equal to the size of the original data set

2. Jackknife algorithm: we simply re sample from the original sampledeleting one value at a time, the size is equal to n - 1.

Estimation of the sample mean

Suppose we extracted a sample x = (x1, x2, ..., xn) from thepopulation X . Let’s say the sample size is small: n = 10.

We can compute the sample mean X̂n using the values of thesample x . But, since n is small, the CLT does not hold, so that wecan say anything about the sample mean distribution.

APPROACH: We extract M samples (or sub-samples) of dimensionn from the sample x (with replacement, MC).

We can define the bootstrap sample means: x̂i ,b, ∀i = 1...,M. Thisbecome the new sample with dimension M.

Bootstrap sample mean:

Mb(X ) =∑M

i x̂i ,b/M

Bootstrap sample variance:

Vb(X ) =∑M

i (x̂i ,b −Mb(X ))2/M − 1 –(Chunk 1)

Bootstrap Confidence interval with varianceestimation

Let’s take a random sample of size n= 25 from a normaldistribution with mean 10 and standard deviation 3.

We can consider the sampling distribution of the sample mean.From that, we estimate the intervals.

The bootstrap estimates standard error by re sampling the data inour original sample.

Instead of repeatedly drawing samples of size n= 25 from thepopulation, we will repeatedly draw new samples of size n=25 fromour original sample, re sampling with replacement.

We can estimate the standard error of the sample mean using thestandard deviation of the bootstrapped sample means. –(Chunk2)

Bootstrap confidence intervals: formula

Figure: Confidence interval in the bootstrap world

Confidence interval with quantiles

Suppose we have a sample of data from an exponential distributionwith parameter λ:

f (x |λ) = λe−λx (remember: the estimation of λ isλ̂ = 1/x̂n).

An alternative solution to the use of bootstrap estimated standarderrors (since the estimation of the standard errors from anexponential is not straightforward) is the use of bootstrapquantiles.

We can obtain M bootstrap estimates λ̂b and define q∗(α) the αquantile of the bootstrap distribution of the M λ estimates.

The new bootstrap confidence interval for λ will be:

[2 ∗ λ̂− q∗(1− α/2); 2 ∗ λ̂− q∗(α/2)] –(Chunk 3)

Regression model coefficient estimate with Bootstrap

Now we will consider the situation where we have data on two variables.This is the type of data that arises in linear regression models. It doesnot make sense to bootstrap the two variables separately, so they remainlinked when bootstrapped.

If our original n=4 sample contains the observations (y1=1,x1=3),(y2=2,x2=6), (y3=4,x3=3), and (y4=6,x4=2), we re-sample theseoriginal couples in pairs.

Recall that the linear regression model is: yi = β1 + β2xi + εi . We aregoing to construct a bootstrap interval for the slope coefficient β2:

1. We draw M bootstrap bivariate samples.

2. We define the OLS β̂2 coefficient for each bootstrap sample.

3. We define the bootstrap quantiles, and we use the 0.025 (α/2) andthe 0.975 (1− α/2) to define the confidence interval for β̂2.

–(Chunk 4)

Regression model coefficient estimate with Bootstrap(alternative): sampling the residuals

An alternative solution for bootstrap estimating the regressioncoefficient is a two stage methods in which:

1. You draw M samples. For each one you run a regression andyou define M bootstrap residual vectors (M vectors ofdimension n).

2. You add those residuals to each of the M dependent variable’svector.

3. You perform M new regression models using the newdependent variables, to estimate M bootstrapped β2.

The method consists in using the (α/2) and the (1 - α/2)quantiles of bootstrapped β2 to define the confidence interval.–(Chunk 5)

References

Efron, B., Tibshirani, R. (1993). An introduction to thebootstrap (Vol. 57). CRC press

Figure: Efron and Tbishirani foundational book

Routines in R

1. boot, by Brian Ripley.

Functions and datasets for bootstrapping from the bookBootstrap Methods and Their Applications by A. C. Davisonand D. V. Hinkley (1997, CUP).

2. bootstrap, by Rob Tibshirani.

Software (bootstrap, cross-validation, jackknife) and data forthe book An Introduction to the Bootstrap by B. Efron andR. Tibshirani, 1993, Chapman and Hall

Markov Chain

Markov Chain is an important method in probability and manyother area of research.

They are used to model the probability to belong to a certain statein a certain period, given that the state in the past period isknown.

Example of weather: What is the markov probability for the statetomorrow will be sunny, given that today is rainy?

The main properties of Markov Chain processes are:

I Memory of the process (usually the memory is fixed to 1).

I Stationarity of the distribution.

Chart 1

A picture of an easy example of markov chain with two possiblestates and reported transition probabilities.

Figure: An example of 2 states markov chain

Notation

We define a stochastic process {Xt , t = 0, 1, 2, ...} that takes on afinite or countable number of possible values.

Let the possible values be non negative integers (i .e.Xt ∈ Z+). IfXt = i , then the process is said to be in state i at time t.

The Markov process (in discrete time) is defined as follows:

Pij = P[Xt+1 = j |Xt = i ,Xt−1 = i , ...,X0 = i ] = P[Xt+1 = j |Xt =i ], ∀i , j ∈ Z+

We call Pij a 1-step transition probability because we move fromtime t to time t + 1.

It is a first order Markov Chain (memory = 1) because theprobability of being in state j at time (t + 1) only depends on thestate at time t.

Notation - 2

The t − step transition probabilityPtij = P[Xt+k = j |Xk = i ],∀t ≥ 0, i , j ≥ 0

The Champman Kolmogorov equations allow us to compute theset − step transition probabilities. It states that:

Ptij =∑

k PtikPmkj ,∀t,m ≥ 0, ∀i , j ≥ 0

N.B. Base probability properties:

1. Pij ≥ 0, ∀i , j ≥ 0

2.∑

j≥0 Pij = 1, i = 0, 1, 2, ...

Example: conditional probability

Consider two states: 0 = rain and 1 = no rain.

Define two probabilities:

α = P00 = P[Xt+1 = 0|Xt = 0] the probability it will raintomorrow given it rains today

β = P01 = P[Xt+1 = 1|Xt = 0] the probability it will raintomorrow given it does not rain today. What is the probability itwill rain the day after tomorrow given it rains today, given α = 0.7and β = 0.3?

The transition probability matrix will be:

P = [P00,P01,P10,P11], or

P = [α = 0.7, β = 0.3, 1−α = 0.4, 1− β = 0.6] –(Chunk 6)

Example: unconditional probababily

What is the unconditional probability it will rain the day aftertomorrow?

We need to define the unconditional or marginal distribution of thestate at time t:

P[Xt = j ] =∑

i P[Xt = j |X0 = 1]P[X0 = i ] =∑

i Ptij ∗ αi ,

where αi = P[X0 = i ],∀i ≥ 0

and P[Xt = j |X0 = 1] is the conditional probability just computedbefore. –(Chunk 7)

Stationary distributions

A stationary distribution π is the probability distribution such thatwhen the Markov chain reaches the stationary distribution, then itremains in that probability forever.

It means we are asking this question: What is the probability to bein a particular state in the long-run?

Let’s define πj as the limiting probability that the process will be instate j at time t, or

πj = limt→∞Pnij

Using Fubini’s theorem(https://www.youtube.com/watch?v=6-sGhUeOOk8), we candefine the stationary distribution as:

πj =∑

i Pijπi , or, better, with these approximations: π0 = βα ;

π1 = 1−αα

Example: stationary distribution

Back to our example.

We can compute the 2 step, 3 step, ..., n- step transitiondistributions, and give a look WHEN it reach theconvergence.

An alternative method to compute the stationary transitiondistribution consists in using this easy formula:

π0 = βα

π1 = 1−αα

References

Ross, S. M. (2006). Introduction to probability models. AccessOnline via Elsevier.

Figure: Cover of the 10th edition

Routines in R

I markovchain, by Giorgio Alfredo Spedicato.

A package for easily handling discrete Markov chains.

I MCMCpack, by Andrew D. Martin, Kevin M. Quinn, andJong Hee Park.

Perform Monte Carlo simulations based on Markov Chainapproach.

Talk 5

Education